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Preface 



Computational physics is the field in physics that has experienced probably 
the most rapid growth in the last decade. With the advent of computers, a 
new way of studying the properties of physical models became available. One 
no longer has to make approximations in the analytical solutions of models 
to obtain closed forms, and interesting but intractable terms no longer have 
to be omitted from models right from the beginning of the modeling phase. 
Now, by employing methods of computational physics, complicated equations 
can be solved numerically, simulations allow the solution of hitherto untractable 
problems, and visualization techniques reveal the beauty of complex as well as 
simple models. Many new and exciting results have been obtained by numerical 
calculations and simulations of old and new models. 

This book presents samples of many of the facets that constitute computa- 
tional physics. Our aim is to cover a broad spectrum of topics, and we want to 
present a mixture ranging from simple introductory material including simple 
exercises to reports of serious applications. This is not meant to be an intro- 
ductory textbook on computational physics, nor is it a proceedings volume of 
a research conference. This book instead provides the reader with an overview 
of computational physics, its basic methods, and its many areas of application. 
Our coauthors lead the reader into new and “hot” topics of research, but the 
presentation does not require any specific knowledge of the topics and methods. 
We hope that a reader who has gone through the book can appreciate the wealth 
of computational physics and is motivated to proceed with further reading. 

The topics covered in this book cover a wide spectrum, with a coarse division 
into “Monte Carlo” type and “molecular dynamics” type chapters. We start with 
discussing random numbers and their generation on computers. Then these ran- 
dom numbers are used in a variety of applications, which center around “Monte 
Carlo methods”. In these applications the focus is first on classical systems in 
physics, chemistry, biology, material science, and optimization. Then quantum- 
mechanical problems are investigated by Monte Carlo procedures. On our way 
we also encounter quantum chaos and fractal concepts, which are of increasing 
importance nowadays. The transition from “Monte Carlo” to “molecular dy- 
namics” occurs in the chapter on hybrid methods, which combine elements of 
both. Then “molecular dynamics” methods are presented, with fiuids and solids 
covered. A chapter on finite-element methods follows, and the two final chapters 
present principles of parallel computers and associated programming models. 

As usual in physics, only active interaction with the matter at hand provides 




VI 



deep insight, and thus we include a diskette that contains sample programs and 
demonstrations to support the interaction of the reader with the text. The sam- 
ple programs and demonstrations are selected to provide a glimpse of current 
research activities, even though the limitations of the available hardware and/or 
the limited patience of some readers might require a reduction in the dimension- 
ality or size of the application. Also some exercises are included to further foster 
an active use of this book. 

The material in this book is born out of lectures the authors gave at a Her- 
aeus Summer School on computational physics at the Technical University in 
Chemnitz. The aim of the summer school was the same as the aim of this book: 
to give a sampler of the field. Due to the gracious funding by the Dr. Wilhelm 
Heinrich Heraeus and Else Heraeus Foundation the editors (see figure) were able 
to present two weeks of intense lecturing and “learning by doing” to more than 
80 students. We would like to use this opportunity to thank the Heraeus Foun- 
dation for making the summer school and this book possible. 

But most important we like to thank our coauthors for their contributions to 
this volume (as well as for their lectures at the summer school). We very much 
appreciate their willingness to contribute even under the severe limitations that 
their everyday teaching and research activities (and administrative duties) put 
on their time. And finally we thank Jorg Arndt, Peter Blaudeck, Andre Fachat, 
Goran Hanke, Karin Kumm, Sven Schubert, and Peter Spaht for their technical 
help and Springer- Verlag for making this volume a reality. 

Chemnitz, December 1995 

Karl Heinz Hoffmann and Michael Schreiber 




With this original answer to the question “How to measure the height of the building 
of the Institut fiir Physik in Chemnitz with a computer and a stop watch only?” the 
editors give a peculiar interpretation of the topic “Physics with a computer” . 
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Random Number Generation* 



Dietrich Stauffer 

Institut fur Theoretische Physik, Universitat zu Koln, D-50923 Koln, Germany 
e-mail: stauffer0thp.imi-koelii.de 



Abstract. The sad situation of random number generation is reviewed: there are no 
good random numbers. But life has to go on anyhow, and thus we explain how to 
produce reasonable random numbers efficiently, emphasizing multiplication with 16807 
and the Kirkpatrick-Stoll R250 generator. 



1 Introduction 

Molecular Dynamics and Monte Carlo are the two standard simulation methods 
of the last decades. Monte Carlo simulations use random numbers to produce 
random fluctuations. Today, they are no longer made at the roulette tables in 
Monaco, but on computers. In the good old days, people printed tables of random 
numbers from which the user could read them off. This, of course, is somewhat 
tedious when simulating a square lattice of size one million times one million, 
today’s world record [1]. About a decade ago, computer chips became available 
which produced random numbers through the thermal noise of the electrons, 
about one number per microsecond. This is not fast enough for many quality 
applications. Besides, for testing purposes we would like to have reproducible 
random numbers: when we have made a program more efficient without changing 
the results, we want to run it again and indeed get exactly the same results, 
and not just roughly the same, within the statistical errors of the Monte Carlo 
simulations. Moreover, when we switch from one computer to another, we would 
like again to get the same results: portability is important. Thus special chips 
using thermal noise are not suitable for this purpose. 

Also, the random numbers should be produced quickly since Monte Carlo 
simulations consume lots of time and we never have enough of it. Thus we need 
efficient methods, and on many computers it is very slow to call a function 
or subroutine to produce one random number. Thus a good random number 
generator should be: 

(1) random 

(2) reproducible 

(3) portable 

(4) efficient 

Using the built-in random number generator of your computer can make your 
program inefficient and nonportable. (Seymour Cray knew what he was doing: 

Software included on the accompanying diskette. 




2 



Dietrich Stauffer 



his random number generators for the good old CDC series or modern Grays 
were efficient.) Besides, the user then does not understand what is going on. 

Thus we now review why the above criteria are difficult to fulfill and what 
to do about it, by programming your own random numbers. 

2 The Miracle Number 16807 

Linear congruential random number generators multiply the last random integer 
by some big factor, add another integer to it, treat the sum modulo some power 
of two, and normalize this integer to the interval between zero and unity. This 
all sounds very complicated, sometimes is presented in this complicated fashion 
in the literature, and may cause you to give up programming your own ran- 
dom numbers. Thus simply forget these complications and look at the following 
Fortran or Basic statement, which works for most 32-bit machines: 

IBM = IBM* 16807 

(fans of Pascal and C should end this line with a semicolon; and enemies of 
International Bussiness Machines may use a different variable name). If you 
start with an odd integer for IBM, e.g., through IBM = 2*ISEED-1, then this 
single program line should give you, again and again, integers IBM distributed 
randomly between —2^^ and 2^^. Just try it out. Why does it work ? 

If you multiply two ten-digit integers, the result will be an integer with about 
twenty digits and is difficult to obtain by paper and pencil. You may estimate, 
however, the leading digit correctly without too much effort. On a computer, 
you may not be able to store more then ten digits for each integer. Then most 
computers simply throw away, without any error message, the somewhat pre- 
dictable leading digits and keep only the ten least significant digits. Of course, 
computers work with binary digits (bits) and not with decimal ones, and with 
four-byte integers (32 bits) all leading bits beyond the least significant 32 bits 
are thrown away if the product of two integers has more than 32 bits. In terms of 
decimal numbers restricted to be at most 999, this would mean that the product 
of 123 and 899 is not 110577 but merely 577. It is clear that these least significant 
digits are difficult to predict, that means for a user they look pretty random. 

In your youth you have learned that a*b equals b*a, and that the product of 
two positive numbers is again positive. In linear algebra or quantum mechanics 
you found out that the first statement was a lie, and now you realize the same for 
the second statement: IBM* 16807 may be negative even when IBM was positive. 
The reason is that the first (most significant) bit of an integer indicates the 
sign. Thus before the leading bits of the product were thrown away, the product 
was positive; but then only the last 32 bits were kept, and the leftmost (most 
significant) bit may be zero (positive 32-bit number) or one (negative 32-bit 
number). So, plus times plus is minus, in about half the cases. 

Some ancient DEC computers may not have liked this overflow above the 32- 
bit limit, but otherwise I am not aware of computers where the above Fortran 
statement causes trouble. Thus we have not only an efficient one-line random 
number generator, but also a portable one. 
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If for some reason you want only positive random numbers IBM, then you 
have to add 2^^ to them if they are negative. This number 2^^ is too large to 
handle for the 32~bit computer, but 2^^ — 1 = 2147483647 is fine. Thus try 
IFCIBM.lt. 0) IBM = IBM + 2147483647 + 1 

and it works if the computer is too stupid to find out that you really want to 
add 2^1. 

If you want to normalize this number to the interval between 0 and 1, you 
multiply a positive random integer by 2“^^ = 4.656612 x 10“^^. If they are both 
positive and negative, use 

Z = 0.5+2.328306 E-10 * IBM 

to get a random number z between 0 and 1. Of course, this normalization from 
an integer to a real number costs a lot of computer time; you could do it faster 
by learning how a fioating point number is stored in your computer and then 
constructing one via bit operations treating the random integer as a bit string 
for the mantissa. 

However, in most cases this normalization is not needed, and you may stay 
within integer arithmetic. For example, some command GOTO 1 should be exe- 
cuted with probability p. Normally this is done with 
IF(Z.LE.P) GOTO 1 

requiring a random number between 0 and 1. This normalization is avoided by 
IF (IBM. LE. IP) GOTO 1 

provided you have defined once (and not millions of times, i.e., for each random 
number) the variable IP = 2147483648.0* (2. 0*P-1.0) 

IF (P.GT. 0.999) IP=2147483647 
IFCP.LT. 0.001) IP=~2147483648 

which varies between —2^^ and 2^^. Now the computer runs faster. (The last two 
”if’ statements are precautions, seldomly needed, in case rounding errors cause 
trouble in the conversion to integers if p = 0 or = 1.) 

The number 16807 = 7^ is not entirely arbitrary; historically earlier was 
65539, and 65549 has also been used. So you may mix them, using in most of 
your program lines multiplication with 16807, but sometimes also 65539. Do not 
try to produce different samples just by changing the multiplicator from 16807 
to 16809, then 16811, and so on. Also, your IBM numbers must always be odd 
integers; to be safe I start with an integer I SEED and then state once IBM = 
2*ISEED“1, as mentioned already above. 

If you simulate at zero temperature (see Sect. 4), then the probabilities are 
0, 1/2, and 1 only. With integer random numbers IBM varying between —2^^ 
and +2^^ the conditions and Boltzmann integers then have to be formulated 
exactly as stated above (not IBM .LT. IP for example), to avoid a spin flipping 
when it should not flip. With fioating point random numbers you need double 
precision real*8. This detail may be important for the fraction of frozen spins in 
spin glasses [7]. 
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3 Bit Strings of Kirkpatrick-Stoll 

The principle of Tausworth shift generators has been around for a long time but 
physicists started to use it mainly after Kirkpatrick and Stoll made it popular 
in a physics journal [2]. For many years it was regarded as superior to multi- 
plication with 16807; this is no longer true but at least it offers a completely 
different alternative. It requires bit-manipulation functions which had not yet 
been standardized before Fortran 90 (they are part of the C language standard) 
and which were initially demanded by the Pentagon for signal analysis. 

Imagine you have two 32-bit integers M and N. Then the exclusive-or oper- 
ation puts a bit equal to one if and only if the two corresponding bits in M and 
N are different; otherwise exclusive-or puts this bit to zero. Thus this bit-by- 
bit exclusive-or IE0R(M,N) (Cray called it M.xor.N but unfortunately this did 
not become the standard) treats 32 bits in parallel and does for each of these 
bits what a logical operation would do for one bit only with logical (Boolean) 
variables. Obviously such bit-handling operations can be used in lots of prob- 
lems where the essential information consists of independent bits, such as in 
Ising models or cellular automata where it is called multispin coding [3] . Fortran 
manuals usually hide these tricks in an appendix on the functions which the 
compiler has stored. 

Imagine you have an array of 250 integers N consisting of completely ran- 
dom bits. Then the next integer N(251) is produced via N(251) = IE0R(N(1) , 
N(148)) and generally 

N(K) = IE0R(N(K-250), N(K-103)) 

where again 250 and 103 are magic numbers which should not be changed. An 
alternative choice is the simple subtraction 
N(K) = N(K-250) - NCK-103) 

but this is less widespread than the exclusive-or method, also called R250. 

To work with it we first need 250 random integers. It is not recommended 
to take the results of IBM* 16807 directly as such integers since the last bits are 
not random enough; for example the least significant bit is always one since IBM 
is always odd. Instead we set a bit in N equal to one if and only if the result of 
IBM* 16807 is negative. Thus our 32-bit integers N are initialized through 
DO K = 1, 250 
ICI=0 
DO 1=1,32 
ICI=ISHFT(ICI,1) 

IBM=IBM* 16807 

IF(IBM.LT.O) ICI=ICI+1 

ENDDO 

N(K)=ICI 

ENDDO 

Here again ISHFT is a bit-manipulation function shifting the first argument by 
one bit to the left. Instead of ICI=ICI+1, one could also have used a bit-by-bit 
or-function ICI=I0R(ICI , 1) ; on most compilers integer and bitstring operations 
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can be mixed. (Cray vector computers may even have separate hardware for 
addition and bit handling, and then run faster if one line contains a mixture 
of bit and arithmetic operations since now the various hardware parts work in 
parallel.) 

Had we used R250 as described the above we would need a huge array N (K) 
of random integers. To avoid this explosion of memory demand it is practical to 
recycle the indices k; that means to start counting again with fc = 1, 2, 3 when 
possible. So we could treat k, k — 103, k — 250 modulo 250 but we would waste 
time. It is usually faster to treat these three numbers modulo 256 since 256 is a 
power of two. Modulo 256 thus means “ take the last eight bits” and is realized 
by a bit-by-bit AND operation with 255: 

N(IAND(255, K)) = IE0R(N(IAND(255, K-250) ) ,N(IAND(255, K-103))) 
Now A; can run from one to “infinity” while the indices of the array N vary only 
between 0 and 255, saving memory. Note, however, that we require 

DIMENSION N (0:255) 

instead of DIMENSION N(256) at the beginning of the program. 

4 A Modern Example 

Let us now discuss a modern example, besides the typical random walk, perco- 
lation, and Ising models. In 1994 Derrida et al [4] found that nontrivial and at 
that time unexplained exponents govern the relaxation of the one-dimensional 
Ising model into equilibrium, if initially all spins are randomly oriented and if 
the absolute temperature is set to zero in a Glauber (heat bath) simulation: 
“spinodal decomposition at T = 0”. The number of spins which have never 
flipped decreases to zero asymptotically proportional to (time)~^ in one, two 
and three dimensions. Even though the one-dimensional Ising model was found 
by Ernst Ising in 1925 not to have a phase transition at finite temperatures, 
here at T = 0 it has an unexpected exponent 9 which was estimated within 
months by better and better simulations as being close to 3/8 until this result 
was proven analytically also by Derrida et al. This is one of the rare cases where 
computer simulations indicated a new effect which thereafter could be explained 
theoretically. (The behavior in higher dimensions is not yet explained.) 

So we fill a one-dimensional array IS randomly with +1 and -1. Then we 
go again and again regularly through the lattice and leave every spin as it is if 
its two neighbors on the chain have the same value as the “spin” IS (I). If both 
neighbors have the orientation opposite to the center spin, this center spin is 
always flipped into the orientation of its neighbors. If the two neighbors differ, 
the center spin does not know whom to follow and fiips with probability 1/2. 
If we imagine our variables IS to be ferromagnetic spins, then we never flip if 
this would increase the energy, always if this decreases the energy, and with 
probability 1/2 if the energy remains unchanged. 

Thats all; a simple program is listed here. 
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c one-dimensional spinodal decomposition in T=0 Ising model 
c (Derrida et al) 

parameter (L=10000, Lpl=L+l) 
dimension is (0 : Lpl) , never (0:Lpl) 
byte is, never 
data max/ 1000/ , iseed/1/ 
print ♦, L,max,iseed 
ibm=2*iseed-l 
do 1 i=0,Lpl 
never (i)=l 
ibm=ibm* 16807 
is(i)=l 

c random initialization: half up and half down 

1 if(ibm.lt.O) is(i)=-l 
do 2 itime=l,max 

do 3 i=l,L 

ien=is(i)*(is(i-l)+is(i+l)) 
if(ien.gt.O) goto 3 
ibm=ibm* 16807 

if (ien.lt . 0. or .ibm.lt .0) then 
c flip spin if this lowers E, or 

c with probability 1/2 if E=const 

is(i)=-is(i) 
never (i)=0 
endif 

3 continue 
m=0 

n=0 

do 4 i=l,L 
m=m+is(i) 

c computer magnetization m and fraction of 

c never flipped spins 

4 n=n+never(i) 

2 print ♦, itime,n,m 
stop 

end 



5 Problems 

A famous and particularly simple example to prove that random numbers are 
not always random enough is the exercise of emptying a cube. Let is(i, j ,k) 
be a L X L X 1/ array initially filled with one’s. During each iteration, we select 
randomly times first an then a j, and finally a as coordinates, and 
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empty the corresponding site, i.e., we set is(i, j ,k) = 0 if it was still occupied 
before. Theoretically, the number of still occupied sites should decay as exp(— t) 
in t iterations. Using multiplication with 65539, we indeed find good results 
for L = 10, but already at L = 20 serious deviations are found after a few 
interations: a quarter of the sites are not reached. This effect is seen within a 
few seconds at most and is not a slight deviation observed only in high-quality 
computations. 

We can avoid this problem by replacing 65539 with 16807 (in at least one 
of the three appearances), or by using one-dimensional storage: instead of three 
indices running each from 1 to L, we use one running from 1 to This trick is 
generally recommended since it speeds up the simulation, particularly on vector 
computers for small L. In our case, the correlations causing our difficulties no 
longer disturb us since each site now requires only one random number and no 
longer three. 

Until recently, multiplication with 16807 was regarded as simpler but less 
reliable than Kirkpatrick-Stoll. In 1992, a group at University of Georgia (USA) 
[5] found errors they called “dramatic” in simulations of the two-dimensional 
Ising model, using Kirkpatrick-Stoll, whereas things worked well with *16807. 
Even the New York Times reported on it at length. With all due respect to the 
world’s best newspaper, I cannot find these deviations by less than one percent 
in the energy very dramatic. However, further investigations by other groups 
confirmed that R250 leads to difficulties which are avoided with *16807. There 
are correlations between random numbers which are 250 iterations apart. If one 
uses larger numbers than the above pair (103,250) then these correlations show 
up at the corresponding longer intervals, and thus in general are less disturbing. 
No difficulties of this type are yet known to me for Ziff’s random number gen- 
erator [6] which combines four random integers (from an array of nearly 10 000) 
via exclusive-ors to produce the next random integer. 

However, *16807 may have even more dramatic errors, ignored by the New 
York Times. Using the above-mentioned one-dimensional storage to simulate a 
five-dimensional Ising model right at its critical point, with initially all spins 
up, I found the magnetization to deviate after only 20 iterations by a factor 
of two from the correct value if the linear lattice size is L = 32. For L = 31 or 
L = 33 the effect vanishes. (Also in three dimensions at L = 128 1 had problems.) 
Apparently there are some correlations between random numbers separated by 
a power of two in the number of random number generations. 

Thus the sad truth is that there is no good and simple random number 
generator. What is good for one problem may be bad for another one. The best 
way is not to rely on some mathematical tests which a generator is said to 
have passed successfully, but to test it for your particular application. Also, use 
widespread generators like *16807 and R250, and not some new one claimed by 
somebody to be excellent: it may have some hidden errors which have not yet 
been found simply because it is new and not widespread. 

So I stopped worrying and started to love *16807, relying on the famous last 
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words of the movie “Some like it hot.”^ I avoid powers of two for L, prefer prime 
numbers for L, vary the lattice size to check for suspicious deviations and use 
one-dimensional storage. Moreover, outside the inner loop I may produce one 
extra time a new random number and throw it away, to destroy periodicities; 
some people even throw away most of their random numbers and use only a small 
fraction of them. Repetitions of random numbers can also be avoided if outside 
the innermost loop some intermediate result of the simulation is used to modify 
the random numbers, e.g., by IBM = IBM*(2*M+1) when M is the magnetization. 
(This helped M. Siegert for large 2D Ising models. After about 10^ iterations, 
*16807 alone will produce exactly the same sequence of random numbers.) 

Then, to check my results, I may replace a few of the 16807 by 65539, or use 
R250 instead. If nothing changes within the statistical errors then I am satisfied; 
otherwise I should worry and try the Ziff complication. In other words, its not 
an exact science. 

6 Summary 

You may feel disappointed because of these unrehable foundations of Monte 
Carlo methods. However, in most cases, published results which turned out to 
be wrong were erroneous not because of random numbers or programming errors 
but because of systematic errors due to finite relaxation times in finite lattices. 
And the fact that people today argue about the sixth digit in the value J/ksTc = 
0.22165 ... of the critical point in the 3D Ising model suggests that inspite of all 
these difhcutlies, high-precision Monte Carlo studies are possible. 
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1. Generate sequences of real random numbers: 

(a) following the algorithm 

kn+i = {ckn){modM) , 

with the parameters 

M = 2\ c = 5, fco = l ; 

(b) using the “16807 generator”. 

2. Using any kind of graphics software at hand, draw points with coordinates 
(r from exercise 1) 



X ^2i—l j 

y = T2u i = 1,2,3,... , 

into a cartesian coordinate system. What can be concluded from the distri- 
bution of the points according to the correlation of consecutive numbers and 
the quality of the generators? 



Verify that also the “16807 generator” produces correlations by magnifying 
cut-outs of the graphics in exercise 2, for example the square 0 < a: < 
0<2/<T^- 




Monte Carlo Simulations of Spin Systems* 



Wolfhard Janke 

Institut fiir Physik, Johannes Gutenberg-Universitat, D- 55099 Mainz, Germany 
e-mail: j ankeOmiro . physik . imi-mainz . de 



Abstract. This chapter gives a brief introduction to Monte Carlo simulations of clas- 
sical 0(n) spin systems such as the Ising (n = 1), XY (n = 2), and Heisenberg (n = 3) 
models. In the first part I discuss some aspects of the use of Monte Carlo algorithms 
to generate the raw data. Here special emphasis is placed on nonlocal cluster update 
algorithms which proved to be most efficient for this class of models. The second part 
is devoted to the data analysis at a continuous phase transition. For the example of the 
three-dimensional Heisenberg model it is shown how precise estimates of the transition 
temperature and the critical exponents can be extracted from the raw data. I conclude 
with a brief overview of recent results from similar high-precision studies of the Ising 
and XY models. 



1 Introduction 

The statistical mechanics of complex physical systems pose many hard problems 
which are difficult to solve by analytical approaches. Numerical simulation tech- 
niques will therefore be indispensable tools on our way to a better understanding 
of systems like (spin) glasses, disordered magnets, or proteins, to mention only a 
few classical problems. Quantum statistical problems in condensed matter or the 
broad field of elementary particles and quantum gravity in high-energy physics 
would fill many other volumes such as this. 

The numerical tools can roughly be divided into molecular dynamics (MD) 
and Monte Carlo (MC) simulations. With the ongoing advances in computer 
technology both approaches are expected to gain even more importance than 
they already have today. In the past few years the predictive power of the MC 
approach in particular was considerably enhanced by the development of greatly 
improved simulation techniques. Not all of them are already well-enough under- 
stood to be applicable to really complex physical systems. But as a first step 
it is gratifying to see that at least for relatively simple spin systems orders of 
magnitude of computing time can be saved by these refinements. The purpose 
of this lecture is to give a brief overview on what is feasible today. 

Prom a theoretical view spin systems are also of current interest since on 
the one hand they provide the possibility of comparing completely different ap- 
proaches such as field theory, series expansions, and simulations, and on the 
other hand they are the ideal testing ground for conceptual considerations such 
as universality or finite-size scaling. And last but not least they have found a 
revival in slightly disguised form in quantum gravity and conformal field theory, 

* Software included on the accompanying diskette. 
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where they serve as idealized “matter” fields on Feynman graphs or fiuctuating 
manifolds. 

The rest of this chapter is organized as follows. In the next section I first recall 
the definition of 0(n) spin models and the definition of standard observables 
such as the specific heat and the susceptibility. Then some properties of phase 
transitions are summarized and the critical exponents are defined. In Sect. 3, 
Monte Carlo methods are described, and Sect. 4 is devoted to an overview of 
reweighting techniques. In Sect. 5, applications to the three-dimensional classical 
Heisenberg model are discussed, and in Sect. 6, 1 conclude with a few comments 
on similar simulations of the Ising and XY models. 

2 Spin Models and Phase Transitions 

2.1 Models and Observables 

In the following we shall confine ourselves to 0(n) symmetric spin models whose 
partition function is defined as 

ZniP) = E exp(-/3H„) , (1) 

{<^i} 

with 

H„ = (Ti = \(Ti\ = l. (2) 

(ij) 

Here (3 = l/ksT is the inverse temperature, the spins live on the sites i of 
a D-dimensional cubic lattice of volume V = L^, and the symbol (ij) indicates 
that the lattice sum runs over all 2D nearest-neighbor pairs. We always assume 
periodic boundary conditions. 

Standard observables are the internal energy per site, e = E/V^ with E = 
—d\xiZn/d(3 = and the specific heat, 

C/kB = {{HD - {HDD /V ■ (3) 

On finite lattices the magnetization and susceptibility are usually defined as 
m = M/V = (|o-av|); CTav = 

i 

X = pv {{crlD - (kavl)^) • (5) 

In the high-temperature phase one often employs the fact that the magnetization 
vanishes in the infinite volume limit and defines 

x' = &V{<rlD . (6) 

Similarly, the spin-spin correlation function can then be taken as 

G(Xi -Xj) = (<Ti -(Tj) . 



(7) 
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At large distances, G(x) decays exponentially and the correlation length ^ can 
be defined as 

^ = - lim |x|/lnG(x) . (8) 

|x|— >oo 

For n = 1, the partition function Zn{P) describes the standard Ising model 
where the spins can take only the two discrete values, ai = = ±1. For 

n = 2, 3, . . . the spins vary continuously on the n-dimensional unit sphere. Par- 
ticularly thoroughly studied cases are n = 2 (XY model) and n = 3 (Heisenberg 
model). The limit n — > oo is known to be equivalent to the spherical model of 
Berlin and Kac [1]. In three dimensions (3D) the model exhibits for all n a con- 
tinuous phase transition from an ordered low-temperature phase to a disordered 
high-temperature phase. The associated critical exponents are generic for the 
so-called 0(n) universality classes. In two dimensions (2D) the situation is little 
more complex. For the 2D Ising model the exact solution by Onsager and later 
Yang predicts a continuous order-disorder phase transition similar to 3D. For all 
n > 2, however, the spin degrees of freedom are continuous and, as a consequence 
of the Mermin-Wagner-Hohenberg theorem, the magnetization vanishes for all 
temperatures. The 2D XY model nevertheless displays a very peculiar (infinite 
order) Kosterlitz-Thouless transition [2, 3, 4]. Due to the 0(2) symmetry this 
model admits point like topological defects (vortices) which are tightly bound 
to pairs at low temperatures. With increasing temperature isolated vortices are 
entropically favored, and the transition is usually pictured as the point where 
vortex pairs start to dissociate: for a review see, e.g., Kleinert [5]. For the 2D 
Heisenberg model and all other 2D 0(n) models with n > 3, on the other hand, 
it is commonly believed that there is no phase transition at finite temperature.^ 
For later reference we also recall another generalization of the Ising model, 
the g-state Potts model [7] with Hamiltonian 

-f^Potts ~ ~J ^ ^ ^(Tiaj 5 ^ Ij • • • 7 • (9) 

(U> 

This generalization has in 3D for all ^ > 3 a first-order transition and in 2D it 
is exactly known to exhibit a second-order transition for g < 4 and a first-order 
transition for all g > 5 [8, 9]. 



2.2 Phase Transitions 

In limiting cases such as low and high temperatures (or fields, pressure, etc.) the 
physical degrees of freedom usually decouple and the statistical mechanics of even 
complex systems become quite manageable. Much more interesting is the region 
in between these extremes where strong cooperation effects may cause phase 
transitions, e.g., from an ordered phase at low temperatures to a disordered phase 
at high temperatures. The prediction of properties of this most difiicult region 

For an alternative view see, however, [6] and references to earlier work therein. 
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of a phase diagram as accurately as possible is the most challenging objective of 
all statistical mechanics approaches, including numerical simulations. 

The theory of phase transitions is a very broad subject described comprehen- 
sively in many textbooks. Here we shall be content with a rough classification 
into first-order and second-order (or, more generally, continuous) phase transi- 
tions, and a summary of those properties that are most relevant for numerical 
simulations. Some characteristic properties of first- and second-order phase tran- 
sitions are sketched in Fig. 1. 



First-Order Transition 




1 

T/To 



Second-Order Transition 




1 

T/T, 





Fig. 1. The characteristic behavior of the magnetization, m, specific heat, C, and 
susceptibility, x? 3-t first- and second-order phase transitions 



Most phase transitions in nature are of first order [10, 11, 12, 13]. The best- 
known example is the field-driven transition in magnets at temperatures below 
the Curie point, and the paradigm of a temperature-driven first-order transition 
experienced every day is ordinary melting [14, 15]. In general, first-order phase 
transitions are characterized by discontinuities of the order parameter (the jump 
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Am of the magnetization m in Fig. 1), or the energy (the latent heat Z\e), or 
both. This reflects the fact that, at the transition temperature To, two (or more) 
phases can coexist. In the example of a magnet at low temperatures the coex- 
isting phases are the phases with positive and negative magnetization, whereas 
at the melting transition they are the solid (ordered) and liquid (disordered) 
phases. The correlation length in the coexisting pure phases is finite.^ Conse- 
quently the speciflc heat, (7, and the susceptibility, x? also do not diverge in the 
pure phases. Mathematically there are, however, superimposed delta function- 
like singularities associated with the jumps of e and m. 

In this chapter we will mainly consider second-order phase transitions, which 
are characterized by a divergent correlation length ^ at the transition tempera- 
ture Tc = l/ Pc- The growth of correlations as one approaches the critical region 
from high temperatures is illustrated in Fig. 2, where six typical configurations 
of the 2D Ising model at inverse temperatures Pj Pc = 0.50, 0.70, 0.85, 0.90, 
0.95, and 0.98 are shown. Because for an infinite correlation length fluctuations 
on all length scales are equally important, one expects power-law singularities 
in thermodynamic functions. The leading singularity of the correlation length is 
usually parametrized in the high-temperature phase as 

e = Co+|l-T/Ter" + ... (T>Te), (10) 

where the . . . indicate subleading corrections (analytical as well as confluent) . 
This defines the critical exponent v and the critical amplitude ^o+ on the high- 
temperature side of the transition. In the low-temperature phase one expects a 
similar behavior, 

e = ^0-(l-T/Te)-" + ... (T<Te), (11) 

with the same critical exponent u but a different critical amplitude ^o- ^o+ • 

An important feature of second-order phase transitions is that due to the 
divergence of ^ the short-distance details of the Hamiltonian should not matter. 
This is the basis of the universality hypothesis which states that all systems with 
the same symmetries and same dimensionality should exhibit similar singularities 
governed by one and the same set of critical exponents. For the amplitudes this 
is not true, but certain amplitude ratios are also universal. 

The singularities of the specific heat, magnetization, and susceptibility are 
similarly parametrized by the critical exponents a, /3, and 7, respectively. 



C = Creg + Co|l-T/Ter“ + ... , 


(12) 


m = mo{l -T/Tc)^ + . . . , 


(13) 


X = Xo\l-T/T,n + ... , 


(14) 



where Creg is a regular background term, and the amplitudes are again different 
on the two sides of the transition, cf. Fig. 1. 

^ For the 2D g-state Potts model with g > 5, where many exact results are known, 
this is illustrated by the recent simulations of Janke and Kappler [16, 17, 18, 19, 20]; 
for details see [21]. 
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Fig. 2. The growt; 
region {lower righ\ 
Ising configuration 
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Finite-Size Scaling. For systems of finite size, as in any numerical simula- 
tion, the correlation length cannot diverge, and also the divergencies in all other 
quantities are then rounded and shifted. This is illustrated in Fig. 3, where the 
specific heat of the 2D Ising model on various L x L lattices is shown. The 
curves are computed from the exact solution of Ferdinand and Fisher [22] for 
any Lx x Ly lattice with periodic boundary conditions. 




Fig. 3. Finite-size scaling behavior of the specific heat of the 2D Ising model on Lx L 
lattices. The critical point is indicated by the arrow on the top axis 



Near Tc the role of ^ in the scaling formulas is then taken over by the linear 
size of the system, L. By writing |1 — T/Td oc we see that at Tc 

the scaling laws (12)-(14) are replaced by the finite-size scaling (FSS) ansatze. 



C = Creg + + . . . , 


(15) 


m oc + . . . , 


(16) 


X oc + . . . . 


(17) 



More generally these scaling laws are valid in the vicinity of Tc as long as the 
scaling variable x = {1 — T/Tc)L^/^ is kept fixed [23, 24, 25, 26]. In partic- 
ular this is true for the locations T^ax of the (finite) maxima of thermody- 
namic quantities, which are expected to scale with the system size as Tmax = 
Tc(l — Xma,xL~^^^ 4- . . .). In this more general formulation the scaling law for, for 
example, the susceptibility reads x(T, L) = L'^^^f{x). By plotting x(T, 
vs the scaling variable x, one thus expects that the data for different T and L 
would fall onto a kind of master curve. While this is a nice way to demonstrate 
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the scaling properties qualitatively, it is not particularly suited for quantitative 
analyses. 

Since the goals of most simulation studies of spin systems are high-precision 
estimates of the critical temperature and the critical exponents, one therefore 
prefers fits either to the “thermodynamic” scaling laws (12)-(14) or to the FSS 
ansatze (15)-(17). 

Similar considerations for first-order phase transitions also show that the del- 
ta-function-like singularities originating from the phase coexistence are smeared 
out for finite systems [27, 28, 29, 30, 31]. In finite systems they are replaced by 
narrow peaks whose height (width) grows proportional to the volume (1 /volume) 
[32, 33, 34, 35, 36, 37, 38]. 



3 The Monte Carlo Method 



Let us now discuss how the expectation values in (3)-(7) can be computed 
numerically. A direct summation of the partition function is impossible, since 
even for the Ising model with only two possible states per site the number of 
terms would be enormous: 2^^®^ 10^^^ for a 50 x 50 lattice! Also a naive random 

sampling of the spin configurations does not work. Here the problem is that the 
relevant region in the high-dimensional phase space is relatively narrow and 
hence too rarely hit by random sampling. The solution to this problem has been 
known for a long time. One has to use the importance sampling technique [39] 
which is designed to draw configurations according to their Boltzmann weight, 

ot exp {-/3H[{ai}]) . (18) 



In more mathematical terms one sets up a Markov chain, 

. . . — y {ai} — > {(jJ — > {a^ ) — > . . . 



with a transition operator W satisfying the conditions 

(a) W{{cFi} — > {o-'}) > 0 for all , 

(b) = l for all {di} , 



(19) 

( 20 ) 

( 21 ) 



Prom (21) we see that is a fixed point of W. A somewhat simpler sufiicient 
condition is detailed balance, 

^ {‘^a) = {di}) . (22) 

After an initial equilibration time, expectation values can then be estimated as 
an arithmetic mean over the Markov chain, e.g.. 



M 3=1 



( 23 ) 
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A more detailed exposition of the basic concepts underlying any Monte Carlo 
algorithm can be found in many textbooks and reviews [23, 40, 41, 42]. 



3.1 Estimators and Autocorrelation Times 



In principle there is no limitation on the choice of observables. The expectation 
value {O) of any observable O can be estimated in a MC simulation as a simple 
arithmetic mean over the Markov chain, O = ^ where Oj = 0[{ai}]j 

is the measurement after the jth iteration. Conceptually it is important to dis- 
tinguish between the expectation value {(9), which is an ordinary number, and 
the estimator 0, which is a random number fluctuating around the theoretically 
expected value. Of course, in practice one does not probe the fluctuations of 
the estimator directly (which would require repeating the whole MC simulation 
many times), but rather estimates its variance = {O^) — {O^ 

from the distribution of Oj. If the N measurements Oj were all uncorrelated 
then the relation would simply be cr^ = GqJN ^ with Oq. — (0|) — {Oj)‘^. For 
correlated measurements one obtains after some algebra 



Gq — 



N 



2Tn 



O.int 



(24) 



where the integrated autocorrelation time. 



= ^ + IZ Mj) ^ 



Nj ’ 



(25) 



turns out to be a sum (“integral”) over the the autocorrelation function. 



A{j) = 



{OjOi+j) - {OiHOj) 



(26) 



(O?) - (OiHOi) • 

For large time separations the autocorrelation function decays exponentially, 

A{j)^^ , (27) 



with a being a constant. This defines the exponential autocorrelation time r^^exp* 
Due to the exponential decay of A(j), in any meaningful simulation with N > 
g^p, the correction term in parentheses in (25) can safely be neglected. Notice 
that only if A{j) is a pure exponential do the two autocorrelation times, r^jnt 
and 7(5 gxp? coincide (up to minor corrections for small r^^nt) [^^l* 

The important point is that for correlated measurements the statistical error 

6q = on the MC estimator O is enhanced by a factor of This 

can be rephrased by writing the statistical error similar to the uncorrelated 
case as — \[^i /iVeff, but now with an effective statistics parameter = 

N'/2T(5 int* This shows more clearly that only at every 2r0 iteration are the 
measurements approximately uncorrelated and gives a better idea of the relevant 
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effective size of the statistical sample. Since some quantities (e.g., the specific 
heat or susceptibility) can severely be underestimated if the effective statistics 
are too small [44], any serious simulation should therefore provide at least a 
rough order-of-magnitude estimate of autocorrelation times. 

Unfortunately, it is very difficult to give reliable a priori estimates, and an 
accurate numerical analysis is often too time consuming (as a rough estimate it 
is about ten times harder to get precise information on dynamic quantities than 
on static quantities such as critical exponents). To get at least an idea of the 
orders of magnitude, it is useful to record the “running” autocorrelation time 

’ ( 28 ) 



which approaches r^^nt large k. Approximating the tail end of A{j) by a 
single exponential as in (27), one derives [45] 






plAo,< 






(29) 



The latter expression may be used for a numerical estimate of both the expo- 
nential and integrated autocorrelation times. 

To summarize this subsection, any realization of a Markov chain (i.e., MC 
update algorithm) is characterized by autocorrelation times which enter directly 
in the statistical errors of MC estimates. Since correlations always increase the 
statistical errors, it is a very important issue to develop MC update algorithms 
that keep autocorrelation times as small as possible. In the next subsection 
we first discuss the classical Metropolis algorithm as an example of an update 
algorithm that near criticality is plagued by huge temporal correlations. The 
discussion of cluster updates in the next subsection then demonstrates that there 
indeed exist clever ways of overcoming this critical slowing-down problem. 



3.2 Metropolis Algorithm 



In the standard Metropolis algorithm [46] the Markov chain is realized by local 
updates of single spins. If E and denote the energy before and after the spin 
flip, respectively, then the probability of accepting the proposed spin update is 
given by 



W({n} 



K»={r'’' 



-I3{E' - E)] 



E' >E 
E' <E 



(30) 



If the energy is lowered, the spin flip is always accepted. But even if the energy 
is increased, the flip has to be accepted with a certain probability to ensure the 
proper treatment of entropic contributions. In thermal equilibrium the free en- 
ergy is minimized and not the energy. Only at zero temperature (/3 — > oo) does 
this probability tend to zero and the MC algorithm degenerates to a minimiza- 
tion algorithm for the energy functional. With some additional refinements, this 
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is the basis of the simulated annealing technique, also discussed in this volume, 
which is usually applied to hard optimization and minimization problems. 

There are many ways of choosing the spins to be updated. The lattice sites 
may be picked at random or according to a random permutation, which can 
be updated every now and then. But also a simple fixed lexicographical order 
is permissible. Or one updates first all odd and then all even sites, which is 
the usual choice in vectorized codes. A so-called lattice sweep is completed on 
average^ when an update was proposed for all spins. 

The advantage of this simple algorithm is its flexibility which allows its appli- 
cation to a great variety of physical systems. The great disadvantage, however, is 
that this algorithm is plagued by large autocorrelation times, as most other local 
update algorithms (one exception is the overrelaxation method [47, 48, 49, 50]. 
Empirically one finds that the autocorrelation time grows proportional to the 
spatial correlation length, 

roc^^ , (31) 

with a dynamical critical exponent z ^ 2. Heuristically this can be understood 
by assuming that local excitations diffuse through the system like a random walk. 
Since ^ diverges at criticality, the Metropolis algorithm thus severely suffers from 
critical slowing down. Of course, in finite systems ^ cannot diverge. Then ^ is 
replaced by the linear lattice size L, yielding r oc L^. 

The problem of critical slowing down can be overcome by nonlocal update 
algorithms. In the past few years several different types of such algorithms have 
been proposed. Quite promising results were reported with Fourier acceleration 
[51] and multigrid techniques [52, 53, 54, 55, 56, 57]. A very nice pedagogical 
introduction to these techniques is given by Sokal [58, 59]. For the 0(n) spin 
models considered here, however, the best performance was achieved with cluster 
update algorithms which will be described in the next subsection in more detail. 

3.3 Cluster Algorithms 

As we shall see below, cluster update algorithms [60, 61] are much more powerful 
than the Metropolis algorithm. Unfortunately, however, they are less generally 
applicable. We therefore consider first only the Ising model, where the prescrip- 
tion for cluster update algorithms can easily be read off from the equivalent 
Fortuin-Kasteleyn representation [62, 63, 64, 65], 



^ exp /3 ^ cTiO-j j 


(32) 


{<^i} \ (is) j 




n [(1 -p) +p^<Ti<7j] 


(33) 


K}<ii> 






(34) 


{ai} {riij} {ij) 





^ This is only relevant in the case where the lattice sites are picked at random. 
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with 

p=l — . (35) 

Here the riij are bond variables which can take the values riij = 0 or 1, interpreted 
as “deleted” or “active” bonds. In the first line we used the trivial fact that 
the product aiaj of two Ising spins can only take the two values ±1, so that 
exp(^aiaj) = x -h yS^i^j can easily be solved for x and y. In the second line we 
made use of the “deep” identity a + 6 = 

Swendsen-Wang Cluster. According to (34) a cluster update sweep then 
consists of alternating updates of the bond variables riij for given spins with 
updates of the spins ai for a given bond configuration. In practice one proceeds 
as follows: 

1. Set riij =0 if ai ^ aj, or assign values riij = 1 and 0 with probability p and 
1 — p, respectively, if ai = cjj, cf. Fig. 4. 

2. Identify clusters of spins that are connected by “active” bonds {riij = 1). 

3. Draw a random value ±1 independently for each cluster (including one-site 
clusters), which is then assigned to all spins in a cluster. 

Technically the cluster identification part is the most complicated step, but there 
are by now quite a few efficient algorithms available which can even be used on 
parallel computers. Vectorization, on the other hand, is only partially possible. 

Notice the difference between the just-defined stochastic clusters and geo- 
metric clusters whose boundaries are defined by drawing lines through bonds 
between unlike spins. In fact, since in the stochastic cluster definition bonds 
between like spins are also “deleted” with probability pQ = 1 — p = exp(— 2/3), 
stochastic clusters are on average smaller than geometric clusters. Only at zero 
temperature {/3 — > oo) does po approache zero and the two cluster definitions 
coincide. As described above, the cluster algorithm is referred to as Swendsen- 




n..=0 


n..=1 


riii=0 


u 


ij 


u 


always 


Pi=P 


Po=1-p 



Fig. 4. Illustration of the bond variable update. The bond between unlike spins is 
always “deleted” as indicated by the dashed line. A bond between like spins is only 
“active” with probability p = 1 — exp(— 2/3). Only at zero temperature (/3 — > oo) do 
stochastic and geometric clusters coincide 




22 



Wolfliard Janke 



Wang (SW) or multiple-cluster update [60]. The distinguishing point is that the 
whole lattice is decomposed into stochastic clusters whose spins are assigned a 
random value H-l or — 1. In one sweep one thus attempts to update all spins of 
the lattice. 



WolfF Cluster. Shortly after the original discovery of cluster algorithms, Wolff 
[61] proposed a somewhat simpler variant in which only a single cluster is flipped 
at a time. This variant is therefore sometimes also called a single-cluster algo- 
rithm. Here one chooses a lattice site at random, constructs only the cluster 
connected with this site, and then flips all spins of this cluster. A typical ex- 
ample is shown in Fig. 5. In principle, one could also here choose a value -hi or 
— 1 at random, but then nothing at all would be changed if one hits the current 
value of the spins. Here a sweep consists of F/(C) single cluster steps, where 
(C) denotes the average cluster size. With this definition autocorrelation times 
are directly comparable with results from the Metropolis or Swendsen-Wang 
algorithm. Apart from being somewhat easier to program, Wolff’s single-cluster 
variant is usually more efficient than the Swendsen-Wang multiple-cluster algo- 
rithm, especially in 3D. The reason is that with the single-cluster method on 
average larger clusters are flipped. 



Embedded Cluster. While it is quite easy to generalize the derivation (32) - 
(35) to g-state Potts models, for the 0(n) spin models with n > 2 one needs a 
new strategy [61, 66, 67, 68]. Here the basic idea is to isolate Ising degrees of 
freedom by projecting ai onto a randomly chosen unit vector r, 

<Ti = cr| + ; <rf = e \<Ti • r| r; e = sign(cTj ■ r) . (36) 

If this is inserted into the original Hamiltonian one ends up with an effective 
Hamiltonian 

H const , (37) 

(ij) 

with positive random couplings, 

Jij = J\ai • r\\aj • r| > 0 , (38) 

whose Ising degrees of freedom can be updated with a cluster algorithm as 
described above. 

For 0(n) spin models the performance of both types of cluster algorithms 
is excellent. As is demonstrated in Table 1 and Fig. 6, critical slowing down is 
drastically reduced. We see that especially in three dimensions the Wolff cluster 
algorithm performs better than the Swendsen-Wang algorithm. Compared with 
the Metropolis algorithm, factors of up to 10 000 in CPU time have been saved 
in realistic simulations [69, 70]. 




Monte Carlo Simulations of Spin Systems 



23 




Fig. 5. Illustration of the Wolff cluster update. Upper left Initial configuration. Upper 
right The stochastic cluster is marked. Lower left Final configuration after flipping the 
spins in the cluster. Lower right The flipped cluster. The shown spin configuration is 
from an actual simulation of the 2D Ising model at 0.97 x /?c on a 100 x 100 lattice 



Improved Estimators. A further advantage of cluster algorithms is that they 
lead quite naturally to so-called improved estimators which are designed to fur- 
ther reduce the statistical errors. Suppose we want to measure the expectation 
value (O) of an observable O. Then any estimator O satisfying (O) = (O) is 
permissible. This does not determine O uniquely since there are infinitely many 
other possible choices, & = O + where the added estimator has zero ex- 
pectation, (T) — 0. The variances of the estimators O', however, can be quite 
different and are not necessarily related to any physical quantity (contrary to 
the standard mean- value estimator of the energy whose variance is proportional 
to the specific heat). It is exactly this freedom in the choice of O which allows 
the construction of improved estimators. 
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Table 1. Dynamic critical exponents z for the 2D and 3D Ising model (r oc L^) 



Algorithm 


D=2 


D=3 


Observable Reference 


Metropolis 


2.125 


2.03 






Swendsen-Wang cluster 


0.35(1) 


0.75(1) 


ZE,exp 


[60] 




0.27(2) 


0.50(3) 


ZEjint 


[71] 




0.20(2) 


0.50(3) 




[71] 




0(log L) 


- 


ZM,exp 


[72] 




0.25(5) 


- 


Zm, rel 


[73] 


Wolff cluster 


0.26(2) 


0.28(2) 


ZE,int 


[71] 




0.13(2) 


0.14(2) 


Zx,int 


[71] 




0.25(5) 


0.3(1) 


ZE.tel 


[74] 



3D XY model 




Fig. 6. Double logarithmic plot of the integrated autocorrelation times for the Swend- 
sen Wang (SW) and Wolff algorithm of the 3D XY model near criticality. The squares 
{p = 0.45421) are taken from [75], the circles {P = 0.4539) and diamonds {p = 0.4543) 
from [76] 



For the single-cluster algorithm an improved “cluster estimator” for the spin- 
spin correlation function in the high-temperature phase, G(xj — Xj) = {ai • aj), 
is given by [67] 



G{xi 



■ Xj) = n 



V_ 



CTi r*Orj0c(Xi)0c(Xj) 



(39) 



where r is the normal of the mirror plane used in the construction of the cluster 
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of size \C\ and 0c (x) is its characteristic function (=1 if x G C and 0 other- 
wise). For the Fourier transform, G(k) = G(x) exp(-ik • x), this implies the 
improved estimator 






( ^r • (Ticoskxi J + I 

iec J \iec 



r ■ (Tj sin kxv 



(40) 



which, for k = 0, reduces to an improved estimator for the susceptibility x' in 
the high-temperature phase, 

^(0) = x7/3 = r • . (41) 



For the Ising model (n = 1) this reduces to x7i^ = (IC'D, i.e., the improved 
estimator of the susceptibility is just the average cluster size of the single-cluster 
update algorithm. For the XY and Heisenberg model one finds empirically that 
in two as well as in three dimensions (|G|) ^ O.Slx' /P for n = 2 [66, 76] and 
(IC'D ^ 0.75xV/^ for n = 3 [67, 77], respectively. 

It should be noted that by means of the estimators (39)-(41) a significant 
reduction of variance should only be expected outside the FSS region where the 
average cluster size is small compared with the volume of the system. 



3.4 Multicanonical Algorithms for First-Order Transitions 

Let us finally make a few brief comments on numerical simulations of first-order 
phase transitions [13]. Since here the correlation lengths in the pure phases are 
finite, the numerical problems are completely different from those in the case 
of second-order phase transitions. Here the origin of numerical difficulties near 
the transition point can be traced back to the coexistence of two phases which, 
for finite systems, is reflected by a double-peak structure of the corresponding 
order parameter or energy distribution. The minimum between the two peaks 
is governed by mixed-phase configurations which are strongly suppressed by an 
additional Boltzmann factor (x exp(— 2crL^“^). Here a denotes the interface 
tension at the phase boundaries, is the cross section of the (cubic) sys- 

tem of size V = and the factor 2 takes into account the usually employed 
periodic boundary conditions [78, 79]. The problem of numerical simulations is 
to achieve equilibrium between the two phases. The system spends most of the 
time in the pure phases. Only very rarely does it “tunnel” through the exponen- 
tially suppressed mixed-phase region from one phase to the other. These rare 
tunneling events, however, are necessary to achieve equilibrium between the pure 
phases. The relevant time scale of equilibrium simulations is thus given by the 
inverse of the additional Boltzmann factor, i.e., the characteristic time r grows 
exponentially with the system size, r oc exp{2aL^~^) [80]. Since for an accu- 
rate numerical study the simulation (and thus computing) time must be much 
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larger than r, this phenomenon has been termed the exponential or super- critical 
slowing down problem. 

A surprisingly simple solution to this problem was discovered by Berg and 
Neuhaus [81]. In what they call “multicanonical” simulations one determines 
iteratively artificial weight factors which modify the original Hamiltonian in 
such a way that the order parameter or energy distribution is approximately 
flat between the two peaks of the canonical distribution. Since then the system 
has no longer to pass through an exponentially suppressed region one expects 
in multicanonical simulations a drastic reduction of the characteristic time scale 
r. A simple random walk argument suggests a power-law behavior, r oc F", 
which has indeed been confirmed in numerical simulations of 2D Potts models 
[82, 83, 84]. 

The multicanonical technique is strictly speaking not an update algorithm 
but a reweighting procedure as discussed in detail in the next section. In prin- 
ciple, it can therefore be combined with any legitimate update algorithm. The 
earlier studies all employed the Metropolis or heat-bath algorithm. In more re- 
cent work it was shown that combinations with multigrid techniques [45, 85, 86] 
and cluster update algorithms [84, 87, 88, 21] are also feasible and can further 
reduce autocorrelation times. 

4 Reweighting Techniques 

Even though the physics underlying reweighting techniques is extremely simple 
and the basic idea has been known for a long time (see the list of references by 
Ferrenberg and Swendsen [89]), their power in practice has been realized only 
quite recently [89, 90]. The best performance is achieved near criticality, and in 
this sense reweighting techniques are complementary to improved estimators. 

If we denote the number of states (spin configurations) that have the same 
energy by f2{E), the partition function at the simulation point j3o can always be 
written as^ 

= . (42) 

E 

This shows that the energy distribution V( 3 q{E) (normalized to unit area) is 
given by 

V0,{E) = . (43) 

It is then easy to see that, given V(Sq{E)^ the energy distribution is actually 
known for any ^8, 

VffiE) = , (44) 

where c is a normalization constant [which in practice is trivially determined 
by enforcing the condition Y1e^(3{E) = 1. Formally, one easily finds that 

^ For simplicity we consider here only models with discrete energies. If the energy 
varies continuously, sums have to be replaced by integrals, etc. Also lattice size 
dependencies are suppressed to keep the notation short. 
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c = = ^iPo)IZ{fi)\. Knowing V 0 {E), we find that ex- 

pectation values of the form (f{E)) are easy to compute, 

{fiE)m = ^f{E)V 0 {E) . (45) 

E 

Since the relative statistical errors increase in the wings of {E) one expects 
(44), (45) to give reliable results only for (3 near ^o- If /3 q is near criticality, the 
distribution is relatively broad and the method works best. In this case reliable 
estimates from (44) can be expected for (3 values in an interval around /?o of width 
oc i.e., just in the FSS region. As a rule of thumb the reweighting range 

can be determined by the condition that the peak location of the reweighted 
distribution should not exceed the energy values at which the input distribution 
had decreased to one third of its maximum value [91]. In most applications this 
range is wide enough to locate from a single simulation, e.g., the specific-heat 
maximum by using any standard maximization routine. 

This is illustrated in Figs. 7 and 8, again for simplicity for the 2D Ising 
model. In Fig. 7 the filled circle shows the result of a MC simulation at /3c — 
log(l + \/2)/2 0.440686 . . ., using the Swendsen Wang cluster algorithm with 

5000 sweeps for equilibration and 50000 sweeps for measurements. The results 
of the reweighting procedure are shown as open circles (recall that the spacing 
between the circles can be made as small as desired, here it was chosen quite 
large for clarity of the plot) and compared with the exact curve [22]. We see that 
even with these rather modest statistics the whole specific-heat peak can be 
obtained with reasonable accuracy from a single simulation. But we also notice 
significant deviations in the tails of the peak. To understand the origin of the 
deviations it is useful to have a look at the energy histograms in Fig. 8. The 
curve labeled (3q = pc is the histogram of the MC data at the simulation point, 
and the other two histograms at /3 = 0.375 and P = 0.475 are computed from 
this input histogram by reweighting. For comparison we have also included the 
histograms obtained from additional MC simulations (with the same statistics) 
at the two P values, indicated by the black dots. We see that the reweighted 
histogram at /3 = 0.475 looks smooth to the eye - and indeed agrees very well 
with the “direct” result of the additional MC simulation at this temperature. 
In Fig. 7 this is reflected by the still very good agreement of the numerical and 
the exact result. For P = 0.375, on the other hand, even visually one would 
not trust the reweighted histogram. While the tail on the right-hand side is still 
in reasonable agreement with the “direct” simulation, the left tail is obviously 
hopelessly wrong. By recalling that the reweighted histograms are computed by 
multiplying the input histogram with exponential factors, this is no surprise at 
all. For — e ^ 1 there are hardly any entries in the input histogram and hence 
the relative statistical errors (oc l/\/counts) are huge. This is the source for the 
large deviations from the exact curve in Fig. 7 for ^ ^ 0.4. 

The information stored in Pf 3 ^{E) is not yet sufficient to also calculate the 
magnetic moments {m^){P) as a function of p from a single simulation at Pq. 
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Fig. 7. Specific heat of the 2D Ising model computed by reweighting (o) from a single 
MC simulation at po = pc (•)■ The continuous line shows for comparison the exact 
result of Ferdinand and Fisher [22] 




Fig. 8. The energy histogram at the simulation point po = pc^ and reweighted to 
P = 0.375 and P = 0.475. The black dots show histograms obtained in additional 
simulations at these temperatures 



Conceptually, the simplest way to proceed is to record the two-dimensional his- 
togram M), where M = mV is the total magnetization. Because of disk 

space limitations one sometimes prefers to measure instead the “microcanonical 
averages” 



{{m>‘)){E) = J2P0o{E,M)m>‘IP0,{E) , 

M 



(46) 
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where the relation YIm Ppo(E^M) = Ppo(E) was used. In practice one simply 
accumulates the measurements of in different slots or bins according to the 
energy of the configuration and normalizes at the end by the total number of 
hits of each energy bin. Clearly, once {{m^)){E) is determined, this is a special 
case of f{E) in (45), so that 



{m^m 






(47) 



Similar to Q{E) in (42), theoretically the microcanonical averages ((m*' ))(£■) 
also do not depend on the temperature at which the simulation is performed. 
Due to the limited statistics in the wings of Pp^{E)^ however, there is only a 
finite range around Eq = {E){!3q) where one can expect reasonable results for 
{{m^)){E). Outside of this range it simply can happen (and does happen) that 
there are no events to be averaged. This is illustrated in Fig. 9, where ({rrP)){E) 
is plotted for the 3D Heisenberg model as obtained from three runs at different 
temperatures. We see that the function looks smooth only in the range where 
the statistics of the corresponding energy histogram is high enough. To take full 
advantage of the histogram reweighting technique it is therefore advisable to 
perform a few simulations at slightly different inverse temperatures /3i. Instead 
of spending all computer time in a single long run, it is usually more efficient 
to perform three or four shorter runs. To find the best solution, however, is a 
very difficult optimization problem, which depends on many details of the model 
under study! Now the question arises how to combine the data from different runs 
most efficiently. A very clear way is to compute for each simulation (at /3i) the 
/3 dependence of Oi = plus the associated statistical error using 

jack-knife techniques [92, 93], say. A single optimal expression for O = 0(/J) is 
then obtained by combining the values Oi in such a way that the relative error 
AO jO is minimized [77], 



o = 



Oi O2 Oz 

{AOif {A02f {AOzf 



{AOf , 



(48) 



with 

^ ^ ^ • ( 49 ) 

{AOf {AOif {A02f {AOzf 

A different procedure at the level of distribution functions was discussed by 
Ferrenberg and Swendsen [94]. For the specific heat the two methods were found 
[77] to give comparable results within the statistical errors. The optimization 
at the level of observables, however, is simpler to apply to quantities involving 
constant energy averages such as {{m)){E), and, more importantly, minimizes 
the error on each observable of interest separately. 
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E/y 



Fig. 9. Energy histograms and microcanonical magnetization squared of the 3D Heisen- 
berg model at the three simulation points for L = 48 



5 Applications to the 3D Heisenberg Model 

Let us now turn to applications of the just-described techniques to the 3D clas- 
sical Heisenberg model (n = 3), focussing on an accurate determination of the 
transition temperature and the critical exponents. To this end the cluster update 
algorithm proved to be a very important tool. Previous studies [95, 96, 97, 98] 
employing the Metropolis algorithm reported for the magnetization an expo- 
nential autocorrelation time of r^,exp = with a ^ 3.76 and z = 1.94(6). 
In simulations with the single-cluster algorithm we obtained for the susceptibil- 
ity values of r^,int ^ 1-5 — 2.0 [77, 99, 100]. As for the 3D Ising and 3D XY 
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model critical slowing down is thus almost completely eliminated. Compared to 
the Metropolis algorithms this implies for a 80^ lattice an acceleration of the 
simulation by about four orders of magnitude. 

In the paper by Holm and Janke [77] two sets of MC simulations are reported. 
The first set of data consists of 18 simulations for T > Tc in the range /? = 0.650 
to 0.686 « 0.99/?c, with typically about 10^ (almost uncorrelated) measurements. 
The correlation length varies in this j3 range from ^?^3to^«12 (see Fig. 11 
below). Here the use of improved estimators was very useful and led to a further 
reduction of the statistical errors by a factor of about 2-3. 

The second set of simulations was performed near criticality. For each lattice 
size typically three independent runs with more than 10^ measurements each 
at different (3 values around /?c were combined using the optimized reweighting 
technique, which is the most important additional tool in the finite-size scaling 
region. 



5.1 Simulations for T > Tc 

Conceptually the easiest way to measure critical exponents are simulations in the 
high-temperature phase. In principle one simply has to fit the MC data for (7, 
m, Xj • • • with the expected power-law divergencies (10), (12)-(14) at criticality. 
For high-precision estimates, however, the procedure is far from being trivial. 
The problem is to locate the temperature range in which a simple power-law 
ansatz like (10) is valid. Clearly, since the omitted correction terms are positive 
powers of T/Tc — 1, at first sight, one would like to perform the simulations 
as close to Tc as possible. However, very close to Tc the correlation length gets 
very large, and on finite lattices one starts seeing finite-size corrections. The 
only way around these correction terms is to use large enough lattice sizes. In 
many models one finds empirically that the thermodynamic limit is approached 
when the linear lattice size satisfies L ^ (6 — 8)^. But since the amplitude ^o+ is 
non-universal, this estimate is not guaranteed to be always true. Therefore this 
question must be investigated very carefully for each model separately. With in- 
creasing temperature the correlation length decreases and finite-size corrections 
are no longer a problem, however then it is, for a different reason, again not 
clear if the simple power-law ansatz is valid. Very far away from Tc the lattice 
structure becomes important and the observables show a completely different 
behavior. In an intermediate range one sees the confluent and analytic correc- 
tion terms which are very difficult to take into account in the fits. So in essence 
the problem is to locate a temperature window in which 1 ^ L. 

There are many ways to extract the correlation length ^ from the asymptotic 
decay of the spatial correlation function. 



G{xi - Xj) = {(Ti • (Tj) oc exp {-\xi - Xj|/0 • 



(50) 




32 



Wolfbard Janke 




Fig. 10. Fit to the inverse of the Fourier transformed correlation function to compute 
the correlation length ^ 



One way is measuring the Fourier transform, G(k) = G(x) exp(— ik • x), 
for a few long- wavelength modes and performing least-square fits to 



G(k)- 



^2(l-cosA:i) + (l/^)^ 



2=1 



c [k^ + {l/if 



(51) 



where c is a constant and = {27r/L)rii, rii = Recall that for zero 

momentum the susceptibility is recovered, G(k = 0) = example 

Fig. 10 shows the data for the 3D Heisenberg model [n = (0,0,0), (1,0,0), (1,1,0), 
(1,1,1), (2,0,0), and (2,1,0)] at /3 = 0.686 « 0.99^c. 

By repeating this analysis for different temperatures one obtains the data 
shown in Fig. 11. The solid lines are fits according to 

ar) = eo+(r/rc-i)-" , (52) 

with 

Pc = 0.69281 ± 0.00004 , (53) 

1/ = 0.698 ± 0.002 , (54) 

^0+ = 0.484 ± 0.002 , (55) 

and a goodness-of-fit parameter Q = 0.92, and 



X'{(i)l& = Xo(l - /3//?c)-^ , 



(56) 
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3D Heisenberg 3D Heisenberg 

Correlation Length Susceptibility 




P P 



Fig. 11. Scaling behavior of the correlation length and susceptibility. The solid lines 
are fits to ^(T) and x'iP) according to the asymptotic power laws 



with 



/3c = 0.69294 ±0.00003 , (57) 

7 = 1.391 ±0.003 , (58) 

Xo = 0.955 ± 0.006 , (59) 

and Q = 0.93. Notice that the correlation length in (52) is written as a function 
of T and the susceptibility in (56) as a function of j3. The high quality of the 
fits a posteriori justifies this choice and indicates that in the chosen temperature 
range confluent as well as analytic correction terms are negligible. If we rewrite 
T/Tc -1 = 1- PI Pc ± (1 - P/Pc)^ ± • • • and consider ^{p) instead of ^(T), we 
expect (and indeed confirmed) an analytical correction to asymptotic scaling. 
There is, however, so far no theoretical understanding of why for the particular 
choice of arguments the analytical correction terms should vanish. 



5.2 Simulations near Tc 

The second set of data consists of simulations near Tc on lattices of size up to 48^ 
[77, 99]. In a later study focussing on topological defects, the maximal size could 
even be increased to 80^ [100]. In the vicinity of Tc, finite-size corrections are 
dominant and one has to employ finite-size scaling (FSS) concepts to analyze the 
data. Usually one starts with an analysis of ratios of magnetization moments. 
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e.g., the Binder parameter [78], 

which leads to estimates of /?c and the critical exponent u. In the sponta- 
neously broken low-temperature phase the magnetization distribution devel- 
ops a double-peak structure with peaks at dimo ^ 0. Since the width of the 
peaks decreases with increasing lattice size one expects that Ul approaches | 
for all T < Tc. In the disordered high-temperature phase the magnetization 
vanishes and the moments are determined by fluctuations alone. In the infinite 
volume limit the fluctuations become Gaussian and a simple calculation yields 
Ul — > 2(n — l)/3n = 4/9 for n = S. Only at the transition point one expects a 
nontrivial limiting value which has been estimated by field theoretical methods 
[101] to be U* = 0,59684 ... for n = 3. For finite systems, FSS predicts that the 
curves Ul{P) for different L intersect around (/?c,C7*) with slopes oc apart 
from confluent corrections explaining small systematic deviations. This allows an 
almost unbiased estimate of and the critical exponent of the correlation 

length u. 

The data for the 3D Heisenberg model in Fig. 12 clearly confirm the theoret- 
ical expectations with a pronounced intersection point at = (0.6930(1), 

0.6217(8)). The final numbers are actually obtained from a slightly more elab- 




P 



Fig. 12. The Binder parameter of the 3D Heisenberg model for various lattice sizes. 
The intersection points determine the inverse critical temperature j3c = 0.6930(1) 
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Fig. 13. FSS of the Binder parameter slopes at ^ = 0.6930 « f3c- The linear fit yields 
an estimate of the correlation length exponent, v = 0.704(6) 



orate analysis taking into account also the confluent corrections to asymptotic 
FSS [77, 99]. 

Also the slopes df7i,/d/3 at /3 = 0.6930 « (3c in Fig. 13 show the expected 
behavior and a fit to the FSS prediction dC//,/d/3 oc yields z/ = 0.704(6), 
consistent with (54) and in very good agreement with estimates obtained by field 
theoretical methods or from series expansions, cf. Table 2. 

In the analysis of the magnetization and susceptibility one proceeds similarly. 
The fit to the magnetization data reweighted to /? = 0.6930 « (3c shown in 
Fig. 14 yields (3/u = 0.514(1), and from the FSS of the susceptibility one reads 
off 7/1/ = 1.9729(17). By multiplying the exponent ratios with the estimate of 
i/ zz 0.704(6), we finally arrive at the values for the critical exponents (3 and 7 
given in Table 2. The analysis of the specific heat is much more complicated since 
the critical exponent a is usually quite small and therefore the singularity in C 
not very pronounced. For the 3D Heisenberg model a is actually negative, so 
that we do not expect a divergence at all. By using the additional data on lattices 
up to 80^ [100], we obtained from the fit (7 = Creg + shown in Fig. 15 an 

estimate of a/z/ = —0.225(80), resulting in a = —0.158(59). Due to the rather 
large statistical error this estimate is still consistent with the value obtained 
from hyperscaling, a = 2 — Su = —0.112(18). Actually a much more precise 
result was obtained from the corresponding FSS fit to the energy at Using 
the ansatz e = Creg + , we obtained a/z/ = —0.166(31), translating 

into a = —0.117(23). Obviously this value is in a much better agreement with 
the hyperscaling prediction. 
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Table 2. Critical coupling and critical exponents of the 3D classical Heisenberg (n = 3) 
model 



Method 


A 


V 


7 


(3 a 


6 


p-expansion^^ 


- 


0.705(3) 


1.386(4) 


0.3645(25) -0.115(9) 


4.802(37) 


e-expansion^ 


- 


0.710(7) 


1.390(10) 


0.368(4) -0.130(21) 


4.777(70) 


MC^ 


0.6929(1) 


0.706(9) 


1.390(23) 


0.364(7) -0.118(27) 


4.819(36) 


MG'* 


0.6930(2) 


0.73(4) 


- 


0.36(2) 


- 


MC' 


0.6930(1) 


0.704(6) 


1.389(14) 


0.362(4) -0.112(18) 


4.837(11) 


MC^ 


0.693035(37) 


0.7036(23) 


1.3896(70) 


0.3616(31) -0.1108(69) 


- 


series^ 


0.6929(1) 


0.712(10) 


1.400(10) 


0.363(10) -0.136(30) 


4.86(10) 


series^ 


0.69302(7) 


0.715(3) 


1.403(6) 


- 


- 



“ [102, 103], [104], [98], [105], ^ [77, 99], ^ [106], ® [107], [108] 




log L 

Fig. 14. FSS of the magnetization at /? = 0.6930 « /3c- The linear fit yields an estimate 
of the exponent ratio, fijv = 0.514(1) 



6 Concluding Remarks 

The intention of this chapter was to give an elementary introduction to the basic 
concepts of modern Monte Carlo simulations and to illustrate their usefulness by 
applications to one typical model. Since the choice of the 3D Heisenberg model 
was obviously biased by my own work in this field, I want to conclude with at 
least a few remarks on the 3D Ising and 3D XY model, for which quite a few 
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Fig. 15. FSS of the specific heat at /? = 0.6930 /3c- The non-linear fit taking 

into account a regular background term yields an estimate of the exponent ratio, 
ajv = -0.225(80) 



high-precision simulations have also been performed. 

Due to its relative simplicity, the 3D Ising model is the best-studied model 
of all 0(n) spin systems. Apart from finite-size scaling analysis as described 
here, many other techniques have also been applied, including the Monte Carlo 
renormalization group (MCRG) and the finite-size scaling of partition function 
zeros. This has led to quite a few very accurate estimates of critical exponents. 
Some of them are compiled in Table 3, where for comparison field theory and 
series expansion estimates are given. As an amusing side remark it is worth 
mentioning the Rosengren [113] conjecture that the critical coupling of the 3D 
Ising model is given by Pc = tanh~^[(v^ — 2)cos(7t/8)] = 0.221658637... - 
a value which is indeed in impressive agreement with the most precise Monte 
Carlo estimates! 

A similar accuracy could also be reached for the 3D XY model, the sim- 
plest model of the 0(2) universality class which governs the critical behavior of 
the A-transition in liquid helium. Some recent results of Monte Carlo simula- 
tions employing the Swendsen Wang and Wolff cluster update algorithm, as well 
as estimates using series expansions and field theory methods are compiled in 
Table 4. 

By comparing the various Monte Carlo estimates collected in Tables 2-4 
with results from field theory and series expansions it is fair to conclude that 
for 0(n) spin models modern Monte Carlo techniques are at the present time 
superior to series expansion analyses. The recently derived critical exponents are 
in fact competitive in accuracy with estimates obtained with the best and very 
elaborate methods of field theory. 
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Table 3. Critical coupling and selected critical exponents of the 3D Ising {n = 1) 
model 



Method 


Pc 


V 


7 


Reference 


p-expansion 


- 


0.6300(15) 


1.241(2) 


[102, 103] 


e-expansion 


- 


0.6310(15) 


1.2390(25) 


[104] 


MCRG 


0.221 652(3) 


0.624(2) 


- 


[109] 


MC 


0.221 6595(26) 


0.6289(8) 


1.239(7) 


[110] 


MC 


0.2216546(10) 


0.6301(8) 


1.237(2) 


[111] 


series 


0.221 655(5) 


0.631(4) 


1.239(3) 


[112] 



Table 4. Critical coupling and selected critical exponents of the 3D classical XY 
(n = 2) model 



Method 


pc 


V 


7 


Reference 


"^He experiment 


- 


0.6705(6) 


- 


[114] 


p-expansion 


- 


0.669(2) 


1.3160(25) 


[102, 103] 


e-expansion 


- 


0.671(5) 


1.315(7) 


[104] 


MC 


0.45421(8) 


- 


1.327(8) 


[75] 


MC 


0.45408(8) 


0.670(2) 


1.316(5) 


[76] 


MC 


0.45417(1) 


0.662(7) 


1.324(1) 


[115] 


series 


0.45406(5) 


0.67(1) 


1.315(9) 


[116] 


series 


0.45414(7) 


«0.673 


«1.325 


[107] 


series 


0.45420(6) 


0.679(3) 


1.328(6) 


[108] 
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Appendix: Program Codes 

The accompanying diskette contains five FORTRAN programs (is^clu.f, 
reweight. f, rewJiis.f, Idis.ex.f, 2disiun_ex.f) and two data files 
(3d_e004.plo, 3d_c004.plo) which can be used to reproduce Figs. 3, 7, and 
8, and to generate an Ising model analog of Fig. 9. The output files denoted 
by . . .plo are kept as simple as possible to allow easy plotting, e.g., with the 
standard utility gnuplot. 



is_clu.f is a Monte Carlo simulation program for the nearest-neighbor Ising 
model with subroutines for the Wolff single-cluster (sc), Swendsen-Wang multi- 
ple-cluster (sw). Metropolis (me), and heat-bath (hb) update algorithm, which 
can be selected in the main parameter statement. All subroutines are set up to 
work for general D-dimensional (hyper-) cubic lattices of size with periodic 
boundary conditions. The dimension, lattice size, simulation temperature, and 
the statistics parameters are also defined in the main parameter statement (the 
dimension and lattice size have to be changed globally in all subroutines as well). 
With the choice of parameters as given in is_clu.param_example, the 2D Ising 
(multiple-cluster) Monte Carlo data of Figs. 7 and 8 can be reproduced within 
about one minute run-time. Of course the detailed timing depends on the type of 
PC or workstation, but it should always be possible to run the simulation inter- 
actively. The average energy, magnetization, specific heat, susceptibility. Binder 
parameter, and cluster averages are written on standard output. Furthermore, 
the energy and magnetization histograms, and the “microcanonical” magneti- 
zation averages ((|m|)) and {(m^)) (cp. (46)) at the simulation temperature are 
saved in the files ehis_b0.plo, mhis_b0.plo, malis_b0.plo, and m21is_b0.plo 
for easy plotting, and in e_b0.his for further reweighting analyses (containing 
the necessary parameter informations). With these output files it is straightfor- 
ward to produce for the Ising model a plot similar to Fig. 9. If desired the time 
evolution of the energy and magnetization measurements can be saved in the files 
e_series.plo and m_series .plo, respectively, by turning on the corresponding 
logical switches in the parameter statement. 

For simplicity the standard UNIX random number generator RAND ( ) is called 
in the MC program. For illustration purposes this generator is good enough, but 
for a serious simulation study it should at any rate be replaced by a more reliable 
random number generator. Again only for simplicity all statistical error analysis 
subroutines are omitted. For a sensible MC study this is clearly unacceptable. 



reweight.f takes as input the energy histogram and the “microcanonical” mag- 
netization lists stored in e_b0.his and computes by reweighting the energy and 
specific heat, the susceptibilities x//? and xV/^j Binder parameter as a 

function of inverse temperature j3. The desired /3-range can be set in the param- 
eter statement, and the dimension and lattice size parameters must be the same 
as in is^clu. f . The results are written into the files eOlGjnc .plo, c016_mc .plo. 




40 



Wolfhard Janke 



sus 016 jhc .plo, chi016_mc.plo, and U016_mc.plo, respectively, where 016 in- 
dicates the linear lattice size. 

The MC data for the one-dimensional Ising chain can be tested against the 
exact results provided by Idis^ex.f for any chain length. The two-dimensional 
MC results can be used together with the output from 2disimi_ex.f to repro- 
duce Fig. 7. Further comparison data for a 16^ lattice from high-statistics single- 
cluster simulations at /?c = ln(l-f\/2)/2 are 2-fe = 0.54685(10), C = 1.4978(10), 
X = 139.669(31), {|C|) = 139.656(29), and U = 0.611537(50). In three dimen- 
sions the MC data can be compared with the exact energy and specific heat 
curves for a 4^ lattice contained in the data files 3d_e004.plo and 3d_c004.plo. 



rewJiis.f reads again as input the energy histogram stored in e_b0.his, com- 
putes reweighted histograms, and stores them in, e.g., ehis_b4750.plo. The 
dimension and lattice size parameters must be the same as in is_clu.f. Here 
b4750 indicates that the histogram is reweighted to (5 = 0.4750. The new inverse 
temperature is inquired interactively by the program. In this way Fig. 8 can be 
reproduced. If gnuplot is used for plotting the histograms, then by using the es- 
cape character ” !” the reweighting program can be called and the new histogram 
immediately displayed without leaving the plot session. 



ldis_ex.f computes the exact temperature dependence of the energy, specific 
heat, and susceptibility of the one-dimensional Ising chain with periodic bound- 
ary conditions of arbitrary length L. For, e.g., L — 16, the results are stored in 
the files ld_e016.plo, ld«c016 .plo, and ld_chi016.plo. 



2disnm_ex.f implements the exact solution of Ferdinand and Fisher (1969) 
for the 2D nearest-neighbor interaction Ising model on finite Lx x Ly lattices 
{Lx^Ly = even) with periodic boundary conditions. The desired lattice size and 
inverse temperature range can be chosen in the parameter statement of the main 
program. The output are two data files, e.g., 2d_e016.plo and 2d_c016.plo for 
Lx = Ly = 16, containing minus the internal energy per site, —E/V, and the 
specific heat per site, <7, as a function of the inverse temperature (3. By running 
this code for various lattice sizes. Fig. 3 can be reproduced. 
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Abstract. Metastable systems such as spin glasses show a wealth of interesting re- 
laxation phenomena. Stochastic optimization procedures such as simulated annealing 
help to solve a number of industrially important minimization problems. Here we show 
that the two fields are intimately connected by the thermally activated relaxation dy- 
namics of complex energy landscapes. The numerical as well as the analytical tools to 
analyse it are discussed. Finally two applications, aging phenomena in spin glasses and 
adaptive simulated annealing procedures, are presented. 



1 An Introduction to Complex Systems 

Metastable systems and stochastic optimization - at first sight one wonders what 
these two fields have in common. Why is it worth discussing them together? 

The aim of this paper is to show that both fields are intimitely connected, 
and that concepts from one field can be put to use in the other. Let us start by 
first describing what these fields deal with. 

Metastable systems are characterized by states which decay very slowly over 
long periods of time. The usual idea is that some mechanism prevents the system 
from reaching its true equilibrium easily, instead the system might be caught in 
some state which can only be left very slowly - a metastable state. An example 
is a system with deep wells in its free energy function, from which an escape can 
take a very long time. Such a system should reveal itself when, for instance, its 
thermal relaxation is studied. 

Stochastic optimization procedures on the other hand try to provide solu- 
tions to optimization problems which have many local minima in their objective 
function. For these optimization problems the usual steepest descent algorithms 
fail as they get easily caught in local minima. For such problems stochastic op- 
timization procedures, and especially simulated annealing, have been used with 
growing success - not to determine the global minimum but to provide “good” 
solutions with values of the objective which are not too “far” apart from the de- 
sired global minimum. Very often the global minimum can only be determined 
by an enumeration of all feasible solutions, and their number will increase ex- 
ponentially with the problem size. Thus these problems are sometimes called 
NP-hard. 



* Software included on the accompanying diskette. 
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Fig. 1. A sketch of a complex state space. The energy is depicted as a function of a 
sequence of neighboring states. It resembles a cut through a mountainous landscape 
and thus the energy function is sometimes refered to as the energy landscape 



Prom this short characterization of the two fields it becomes clear that in 
both cases the existence of a function on a high-dimensional state space which 
exhibits many local minima is of great importance. Such a state space is often 
called complex. Figure 1 shows a sketch of a such a complex state space. It is 
important to note that in order to draw a figure like that one needs a definition 
of what the neighboring states are. Otherwise one would not know what the 
abscissa means. From a mathematical point of view one needs this definition to 
define a local minimum (a state where all neighbors have higher energy). 

A typical example of a system with a complex state space is an Ising spin 
glass [1]. Its energy is given by 

E = ^ ] JijSiSjj (1) 

where the Ising spins Si can only take the values +1 or —1, and the coupling 
constants Jij are random quantities, which can take positive and negative values. 
Here the states are defined by the configuration of all spins {^j}, and neighboring 
states are obtained from each other by dipping one of the spins. 

An often-used picture for complex state spaces is that of a mountainuous 
landscape, where the heights of the mountains represent the energy and the two 
horizontal axis have to mimic two of the many dimensions of the physical system. 
Even though this picture is very suggestive, one should bear in mind that the 
high dimensionality of the physical system might not be properly taken care of. 
So generally speaking this picture can lead the intuition, but the consequences 
should be mathematically checked. 

We shall now proceed by discussing the dynamics in such complex state 
spaces. We are particularly interested in the thermal relaxation dynamics, i.e., 
the dynamics which describe the equilibration of the system in contact with a 
heat bath. It will turn out that there are two major tools to analyse the ther- 
mal relaxation. One will come from computational physics, namely simulation 
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methods, and the other will come from the realm of theoretical physics, namely 
the theory of Marcov processes [2]. 

After these two tools have been presented in Sect. 2, we shall use them to 
discuss two particular problems occuring in complex systems. The first problem 
(Sect. 3) comes from the field of metastable systems and deals with the way 
the above-described techniques can be used to model the relaxation of complex 
systems in contact with a constant temperature heat bath. This problem will 
also serve as an example for the application of Marcov processes. 

The second problem (Sect. 4) comes from the field of stochastic optimization 
and deals with finding the global minimum of such complex systems, or - in 
physical terms - the ground state. This problem will make use of simulation 
techniques, and will thus provide some insight into the problems there. 

2 Dynamics in Complex Systems 

The link between the two topics of this paper is the complex state space. For 
both fields it is important to understand the dynamics in these state spaces. 
For the physical systems the dynamics of interest is the thermal relaxation in 
contact with a heat bath. For the stochastic optimization schemes it will turn 
out that the same dynamics can be very profitably used to obtain solutions to 
complex optimization problems. So let us first turn to the thermal relaxation in 
contact with a heat bath. 

We will view it as a thermally induced hopping process between the config- 
urations of the system. The energy fluctuations between system and bath will 
allow transitions between different states or configurations with varying proba- 
bility depending on their connectivity and energy. 

There are now two tools available to analyse the behavior of the system. One 
way to proceed is to use simulations of the thermal hopping process, the other is 
to use the theory of Marcov processes. Interestingly, it turns out that the latter 
can also be used as a theory to describe the stochastic simulations. 

2.1 Thermal Relaxation Dynamics: The Metropolis Algorithm 

Let us first deal with the algorithmic approach using simulations. The aim in 
simulating a system in contact with a heat bath is to create a Boltzmann dis- 
tribution in the state space according to the temperature of the heat bath. This 
problem is solved by the Metropolis algorithm [3]. 

Let O = {q:} represent this state space, let E{a) be the energy defined on 
the state space, and let T be the temperature of the heat bath in which the 
physical system is immersed. In addition a so-called neighborhood relation or 
move class is needed, which typically takes the form of an undirected graph 
structure on the state space. We will denote by N (a) the set of neighbors of a 
state a in this graph. As an example remember the above-mentioned case of an 
Ising spin glass, where two states (= spin configurations) usually are neighbors 
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if they differ by one spin flip. We remark that for certain problems one may have 
several alternative move classes available, for instance for the spin-glass case one 
could also use states as neighbors which differ by more than one spin flip. 

To start the algorithm a state uq is chosen at random. Then at each step 
of the algorithm a neighbor uj' of the current state uJk is selected at random to 
become the candidate for the next state. It actually becomes the next state only 
with probability 



_ r 1 if AE<0 

^acceptance - | exp(-Z\£'/T) if AE > 0, 



( 2 ) 



where AE = E{uj^) — E{uJk), and ks = Ihy choice of units. If this candidate is 
accepted, then = a;', otherwise the next state is the same as the old state, 

^k+l = ^k- 

This probabilistic decision rule is implemented by choosing a random number 
r, uniformly distributed between 0 and 1 and compare it with the Boltzmann 
factor exp{—AE/T). If r < exp{—AE/T), then 0 ;^+! = a;', else 



2.2 Thermal Relaxation Dynamics: A Marcov Process 



A different way of studying thermal relaxation is by modeling it as a discrete time 
Marcov process. The thermally induced hopping process induces a probability 
distribution in the state space. The time development of Pa{k), the probability 
to be in state a at step k, can then be described by a master equation [2] 

Pa{k + l) = y2rMT)Pis{k). (3) 

p 



The transition probabilities Th^(T) depend on the temperature T and can be 
chosen in a number of ways, but always they have to insure that the stationary 
distribution is the Boltzmann distribution P®^(T) = Qa exp(—Ea/T) /Z ^ where 
Z = is the partition function and is the degeneracy of state a. The 

latter is needed if the states a already represent quantities which include more 
than one microstate. 

One of the possible choices for the transition probabilities is the Glauber dy- 
namics [4], which we here just mention. Another one is the Metropolis dynamics, 
which turns out to be also the one which describes the process induced by the 
above-described Metropolis algorithm. Its transition probabilities are deflned as 
follows. 

First one deflnes the inflnite-temperature transition probabilities II ( 3 a = 
Ppa{oo) from state a to /3 by 



rO if/3^iV(a) 

\ 1/ I iV(a) I if/3eiV(a), 



( 4 ) 



where | N{a) | is the number of neighbors of a. These are the transition prob- 
abilities if the algorithm automatically accepts each attempted move, i.e., if 



T — 00 . 
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At finite temperature the acceptance decision is superimposed on il in (4) 
to give r{T) defined by 

r n^a exp{-AE/T) ifAE>0,a^/3 
EfSa = ^ nf3a if AE <0, 13 (5) 

1 “ if a = /?, 

where now AE = E{p) — E{a). 



2.3 Thermal Relaxation Dynamics: A Simple Example 



A simple example will show, why the Metropolis algorithm creates a Boltzmann 
distribution as the stationary solution of (3) in state space. Let us consider a 
(rather artificial) state space with states numbered by natural numbers i and a 
neighborhood relation such that states with successive numbers are neighbors 
N{i) = {i- 1,2 + 1} and N{0) = {!}. The energy is E{i) = iEo- Thus the 
transition probabilities are 0, apart from 



= ^exp(-Eo/T), 

( 6 ) 

Ai = 1 - ^(exp(--Eo/r) + 1). 



State 0 has to be treated differently, as it has no lower neighbor we find A^o = 

exp(-A/A A,o = 1 - exp{-Eo/T). ^ 

In the stationary solution is constant, thus the probability flow A.o-Pq* 
from state 0 to state 1 has to equal the reverse flow Po,i from state 1 to state 
0. Then the same must be true for the flow between state 1 and state 2, and so 
forth. So we And 

( 7 ) 

and thus for 2 > 0 



E 



= exp(-jEJo/T) = 



exp(-(f + 1)Eq/T) _ A+^ 
exp(-(i)Eo/T) Pr ’ 



( 8 ) 



as expected for the Boltzmann distribution. 
For 2 = 0 we find 



£hi 

A,i 



||2hexp(-E./T) 

2 

-exp{-Eo/T) = 



( 9 ) 
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which shows that the infinite temperature equilibrium distribution gives twice 
the probability to state 1 as it gives to state 0: Qi/go = 2. 

Usually statistical mechanics requires for a microscopic state space that the 
stationary distribution P^^(oo) = Qa/Z of IT = P(oo) has to be uniform [5]. An 
easy way to achieve this is to make | N (a) \ constant over a, i.e., to make each 
vertex have the same number of neighbors. If however nonuniform degeneracies 
Qa are needed, then the transition rates in the Metropolis algorithm need to 
be adjusted accordingly by choosing an appropriate move class. Note that one 
has only one free choice: either one defines a move class, which then fixes the 
degeneracies, or one requires certain degeneracies and that puts constraints on 
the move class. 

If in a given system (7) holds for all neighbors, 

( 10 ) 

then the system is said to obey detailed balance, as the probability flows along 
each bond of the neighborhood graph balance out separately. While in the above 
example detailed balance holds due to its structure, in more complicated systems 
one has to prove or assume that it holds. 



3 Modeling Constant-Temperature Thermal Relaxation 

In this section we want to study the thermal relaxation of a complex system 
at constant temperature by means of a Marcov process model. We are then 
immediately confronted with a major problem common to macroscopic physical 
systems described on a microscopic level: the enormous number of states. Some 
simplification is asked for and a starting point for such an operation is given 
by the rough qualitative picture of the thermal relaxation process painted as 
a random hopping process in a mountain range (or diffusion if considered in a 
continuous time limit). 

This picture suggests that - as in real mountains - the movement within one 
valley is easy compared to the movement from one valley into another, which 
involves the crossing of a pass. In terms of the thermal relaxation this means that 
within such a valley in state space the relaxation proceeds quite fast and local 
equilibrium is obtained after a short while. Then the barrier crossing between 
two valleys takes much longer and - as there are valleys inside larger valleys 
inside larger valleys etc. - we have to expect a whole spectrum of longer and 
longer relaxation times. 

Note that, again as in real mountains, the transition of a pass might not only 
be influenced by the height (the energy) of the pass alone, there might be also 
a dynamical restriction (for instance the width of a pass) which needs its own 
modeling. Nonetheless it is important to realize that already the complex energy 
landscape leads to a slow down of the relaxation. 
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3.1 Coarse-Graining a Complex State Space 

In this subsection we show how the above concept can be used in a more technical 
fashion. 




Fig. 2. The coarse-graining of a complex state space is shown. All connected states 
within one energy band (here indicated by the dashed lines) are lumped into the nodes 
of a tree 



Figure 2 depicts the underlying idea [6]. Suppose one introduces hyperplanes 
with energy 

E{1) = AEt^l (11) 

into the state space and calls all states between two neighboring hyperplanes an 
energy band. The connected (in the sense of the neighborhood graph introduced 
above) parts of the state space within an energy band are then lumped together 
to represent the nodes of a new coarse-grained structure. 

If AEt is large compared with the energy change AEmicro of a transition 
between microstates, then the connectivity between the microstates induces a 
unique connectivity between the nodes as indicated in Fig. 2. The resulting 
structure has a tree topology. Each node has only one connection to a higher 
energy node, this is called the mother node. The connected nodes at lower energy 
are called daughter nodes. The number of daughter nodes will vary from node 
to node. Also the number of states lumped into a node will vary. In general, we 
expect that the higher the nodes are in energy the more states they will include, 
i.e., the degeneracy of a node will increase with its energy. 

There are also other coarse-graining procedures [6] which lead to trees with 
nodes not separated by equal energy intervals. The important point is that a 
hierarchical tree-like structure ife the result, and thus trees can be regarded as a 
generic coarse-grained structure for complex state spaces. 

An even coarser lumping would collect all states belonging to one energy 
band into one node. This leads directly to the simple example presented in Sect. 
2.3. However, this would lump [7] states together which could reach each other in 
the underlying complex state space only via higher barriers, i.e., states belonging 




Metastable Systems and Stochastic Optimization 



51 



to a higher energy band. Thus the essential feature of a complex state space, the 
crossing of barriers, would be neglected and so all the complexity of the state 
space would be lost in the model! It is thus important that only the connected 
parts of the state space are lumped together. 

Consider now the thermal hopping in the complex energy landscape. Rather 
than calculating the full-time dependence of the probability distribution in the 
state space, we can choose to monitor the presence or absence from a node of the 
corresponding tree. We have hereby defined a stochastic process, which in general 
will not be a Marcov process because the induced transition probability from 
node to node might depend on the internal (microscopic) distribution within 
one node. 

However, it turns out [8] that inside a coarse-grained area a kind of local 
equilibrium distribution is very quickly established, which then makes the coarse- 
grained relaxation process (at least approximately) Marcovian. The result is that 
Marcov processes on tree structures are good modeling tools for the thermal 
relaxation of complex systems [6, 9, 10]. 

3.2 Tree Dynamics 




Fig. 3. The construction of an LS tree 



Thermal relaxation dynamics have been studied for a number of tree structures. 
We present here one example: the LS tree. Following [11, 12] the tree is con- 
structed as shown in Fig. 3. The building block is a ‘mother’ node connected 
to two ‘daughters’ of lower energy, the energy differences I being Ai and A 2 
respectively. The tree is constructed iteratively out of one initial block by (a) 
duplicating the already existing structure, and (b) identifying the top nodes of 
the two resulting twins with the daughters of the building block. For Ai = A 2 
the structure is regular and coincides with that studied, e.g., in [9]. 
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A level in the tree is the set of nodes connected by the same number of bonds 
to the root (the top node of the tree). Each node has thus a unique level index 
with the local energy minima being at level 0, their mothers at level 1 and 
so forth up to the root node being at level /max, which is the height of the tree. 
Each node is contained in a unique subtree of height m, dubbed the mth subtree 
containing that node. The whole tree contains N = — 1 nodes. For each 

node a we introduce a degeneracy g{a), which can account for the number of 
microstates lumped into the node in a coarse-graining procedure. We assume a 
level-dependent degeneracy g{a) = g{l) = k,^ for all nodes at level 1. k is a. free 
model parameter. 

For the dynamics it is more convenient to use a continuous time description 

= (12) 

(3 



with the transition matrix p = Fa,p — Specifically the transition proba- 
bility is chosen to be: 



mother ^daughter ^ J 

pc _ (/) 

^ daughter, mother ^ i 



(13) 

(14) 



where J = zii, ZI 2 indicates the energy difference between mother and daughter 
nodes. Note that the parameter does not change the (Boltzmann) equilib- 
rium distribution on the tree, it only defines the time scale. 

The time development of the above-defined model has been thoroughly stud- 
ied by numerical diagonalization. We performed most calculations on a tree of 
height /max = 7, leading to a 255 x 255 transition matrix F. Finite size effects 
for the temperature range of interest were found to be quite negligible for this 
system size. 

Consider the case where initially all probability is concentrated within one 
local minimum. The overall and important result is that the equilibration pre- 
cedes by a sequence of partial equilibria in successively larger subtrees. Figure 
4 shows for instance the time development of the probability within subtrees of 
different heights. One very characteristic feature in this figure is the power-law 
decay of the probability, which can be traced back to the regular hierarchical 
structure of the model. 

In order to see the effect of the parameter let us consider the typical 
situation in which a complex system is quenched into a low energy-state. If the 
low-energy part of the phase space can meaningfully be modeled by a tree, highly 
asymmetric downward rates imply that the initial condition for the thermM 
relaxation process following the quench can be concentrated into a small region. 
In our simple model, an arbitrarily sharp initial condition can be obtained for 
instance by choosing sufficiently large. More generally we expect the values 
of to play a role when temperature variations are important [13]. 




Metastable Systems and Stochastic Optimization 



53 




Time 

Fig. 4. The probability contents in subtrees of different height (1 to 7) are shown as a 
function of time. Note the power-law dependence 



3.3 A Serious Application: Aging Effects in Spin Glasses 

We now present an application of the above ideas to demonstrate the power of the 
modeling approach. We show that a number of highly interesting experimental 
results for spin glasses can be explained by a very simple relaxation model based 
on a tree structure. 

First we describe briefly the experiments. Aging effects were first observed 
in spin glass systems [14, 15, 16, 17, 18, 19] and have also been measured in 
high-Tc superconductors [20] and CDW systems [21] as well. In the so-called 
Z(ero)F(ield)C(ooled) experiments a sample is cooled to a low temperature and 
‘aged’ for a ‘waiting time’ tw, without field. Thereafter a small field is applied, 
and the response of the magnetization to the perturbation is measured. Contrary 
to one’s naive expectations, the response depends both on the time during which 
the system has been acted upon by the field and on the waiting time. This 
situation persists through many decades, and indicates that the system never 
reaches thermodynamic equilibrium during the observation time. 

In the spin glass ZFC magnetization experiments the applied field H is kept 
constant, and the salient feature of the data is a kink in the magnetization 
plotted as a function of logarithmic time at t = or equivalently, 
a maximum in the derivative 5(t, tw) of tho magnetization with respect to the 
logarithm of the time at t = tw- This feature is only present for temperatures 
below the critical spin glass temperature Tg. 

Rather than trying to coarse-grain the state space of a microscopic spin glass 
model, the starting point for the model is a tree structure as presented in the 
previous subsection. We now show that the model is able to reproduce the main 
aging features of the experiments. In order to calculate response properties, we 
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use linear response theory 







( 15 ) 



with the nonequilibrium response function R{t',ty^) [22] rather than the usual 
correlation function, which is not appropriate for this nonequilibrium situation. 

It would be beyond the scope of this paper to discuss the response function 
in detail. Here it is important that, apart from magnetic properties 
which have to be defined in addition, t^) depends crucially on the solution 
Pa{t) of (12). 





Fig. 5. A comparison between the experimental data (left) for ZFC experiments and 
the tree-model data (right) 



Figure 5 shows a comparison between the experimental data and the model 
result. Note that the latter reproduces very well the maxima in the relaxation 
rate at times which correspond to the waiting times [23, 24]. 

There are a number of further experiments which measure the response as 
temperatures are changed during the waiting time. This leads to partial reini- 
tialization effects which can also be reproduced very well by this simple model. 
For an in-depth discussion see [13]. 

Finally we note that a theoretical understanding of the physical mechanism 
behind the aging phenomenon of spin glasses has important implications for the 
more general issue of describing relaxation phenomena in complex systems such 
as glasses and polymers, as well as some nonphysical systems which are currently 
treated by statistical mechanical methods, for instance Boltzmann machines and 
simulated annealing schemata [25]. 
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4 Stochastic Optimization: 

How to Find the Ground State of Complex Systems 

Whereas the previous section was devoted to the thermal relaxation of complex 
systems in contact with a fixed-temperature bath, we shall now turn towards 
the relaxation process under the influence of temperature changes. The reason 
for doing so is the problem of finding the ground state in complex systems. By 
now it is certainly clear that the multivalley state space structure with its many 
local minima makes it quite complicated to find the ground state. Simple descent 
methods which rely on following the energy function down to states with lower 
and lower energy will invariably lead into one of the many local minima, but 
rarely into the global one. Recent investigations [26] show that the number of 
local minima can grow exponentially with the system size, thus just hoping to 
find the ground state by chance or a few repeated trials does not work. So, other 
methods are needed. 



4.1 Simulated Annealing 

One idea for solving this problem was born in the early eighties: simulated an- 
nealing [27, 28]. It is based on the observation that a careful annealing of physical 
systems brings them closer to equilibrimn than a quenching process. This is well 
known and widely used in the preparation of single crystals or in the manufac- 
turing of telescope mirrors. Careful annealing of a real physical system brings 
it into its equilibrium with the ambient temperature T and thus for T — > 0 the 
system moves into its groundstate(s). 

But how can this fact be used for finding ground states of spin glass models? 

The answer is simple. Simulate the thermal relaxation of the system by the 
Metropolis algorithm and slowly turn down the temperature parameter which 
enters in the transition rate: simulated annealing! 

One quickly realizes that this procedure has a much wider application range 
than physical models. Given a state space with an (energy) function with many 
local minima for which the global minimum is sought, all one needs to do is 
to artificially introduce a neighborhood relation on the states and then perform 
a random hopping process according to the Metropolis rules, where the tem- 
perature is now just an external parameter which has to be lowered properly. 
Simulated annealing can thus be used as a general stochastic optimization tool. 

Simulated annealing has been applied to a wide range of problems with a 
complex state space structure. Apart from finding the ground state of a spin 
glass [29] simulated annealing has also proved a useful tool in the design of 
integrated circuits [27, 30, 31] for partitioning, routing, and placement [32]. It 
has been applied to many other problems including the traveling salesman [33, 
34] , graph partitioning [35] , restoration of images [36] , and parameter estimations 
[37]. While this list is far from exhaustive, it shows that the problems attacked 
by simulated annealing are of great scientific and industrial importance. 




56 



Karl Heinz Hoffmann 



A thorough analysis showed [36] that the simulated annealing procedure in- 
deed finds the ground state with a probability of 1. However, the annealing 
schedule, i.e., the way in which the temperature parameter T is lowered as a 
function of time, needs to be very slow and is 

m ~ ± ( 16 ) 

Note that to reach the ground state with a probabiUty of 1 one needs infinite 
time. 

Thus for the finite time usually available to humans one has to live with the 
fact that the ground state is not found with a probability of 1. That brings us to 
the problem of finding that schedule which provides the “best” solution we can 
get under the restriction of finite (computer) time, which in our case translates 
into a finite number of Metropolis steps. In other words the question is, what is 
the “optimal” schedule? 

Before this question can be answered the yardstick with respect to which the 
optimality is determined must be defined. Indeed several criteria are possible, 
the two commonly used are: (a) the final energy, (b) the BSF energy Ebsf (h) = 
mino<A:'<ife i.e., the lowest energy seen up to a certain time k. 

The final and the BSF energies are stochastic quantities [38]. Their proba- 
bility distribution evolves with time and is induced by the underlying random 
walk in the state space governed by the Metropolis algorithm. The distribution 
as such cannot be optimized, only certain aspects (for instance its mean, its 
median, or its mode). The choice between the different criteria has to be made 
externally. After a choice has been made, the determination of the optimal sched- 
ule becomes a new optimization problem, which can be attacked analytically or 
numerically. 



4.2 Optimal Simulated Annealing Schedules: A Simple Example 

Below we show a very simple example for which the optimal schedule has been 
determined. The example system consists of only three states i? = {1,2,3} and 
shows how the crossing of a single barrier is optimized. The states have energies 
E{1) = 0,E{2) = 1, and £”(3) = B > 1, and the move class is N{1) = {3}, 
N{2) = {3}, N{3) = {1,2}. Thus 1 is the global minimum, 2 is the local one, 
and 3 represents the barrier in between. 

For this simple model optimal schedules for the mean final energy were deter- 
mined [39]. The knowledge of the optimal schedule allows a comparison between 
different schedules. For instance it was shown that the optimal schedule per- 
forms much better than any exponential schedule T{t) = or linear schedule 
T{t) =Tq — et, thus providing some indication for the potential gains. In Fig. 6 
this is demonstrated. The mean final energy for the optimal, the linear, and the 
exponential schedules are compared as a function of the total available annealing 
time r. The linear and the exponential schedules use the best possible values for 
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Fig. 6. A comparison between the mean final energy as a function of the available 
annealing time for the best linear, the best exponential and the optimal schedules 



€ and z/. Note how much better the optimal schedule performs compared to the 
other two. 

Analysing this simple model shows that the barrier height enters the optimal 
schedule as an essential parameter. Later optimal schedules [40, 41] for larger 
systems were determined numerically. It turned out that the optimal schedules 
are dominated by a single barrier during certain time intervals. 

Summarizing these studies of optimal schedules one sees it is essential to hold 
the system close enough to equilibrium in order to prevent getting trapped in 
local minima. On the other hand one has yet to maintain a certain disequilibrium 
in order to anneal as quickly as possible. 

4.3 Adaptive Annealing Schedules and the Ensemble Approach 
to Simulated Annealing 

Investigations of truly optimal schedules for simple systems [39, 41, 42] have 
shown that the schedule depends critically on the barrier height which has to 
be overcome to leave a local minimum. In the usual optimization problem these 
barrier heights are unknown, moreover they differ from problem to problem. 
Thus, the schedule has to be adapted to the problem. 

Adaptive schedules using information gathered during the annealing have 
already been suggested before [43, 44]. Here we present as a simple example an 
adaptive schedule which has been proven to work well on spin glass and standard 
travelling salesman problems. The schedule is easy to implement and has only 
negligible computational overheads. 

It is based on the ensemble approach to simulated annealing, in which a 
collection of copies of the system, rather than just one, is annealed according to 
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the same schedule [37, 38, 45, 46]. One of the most important advantages of the 
ensemble approach is that statistical information about the ensemble can be used 
to adjust the schedule with which the temperature is lowered. In principle the 
temperature T can be lowered after every Metropolis step. We decided however 
to use the more usual approach in which a certain number of steps is performed 
at the same temperature before T is lowered by a certain amount. This number 
of steps can be predetermined or it can be set adaptively during the annealing 
using the statistical data from the ensemble. 

The philosophy behind the schedule [47] is to hold the ensemble fairly close 
to the equilibrium corresponding to the annealing temperature. As an indicator 
for this, the ensemble average of the energy {E ) is monitored. For an ensemble 
close to equilibrium, {E ) will fluctuate around the thermodynamic equilibrium 
corresponding to the annealing temperature, whereas for an out of equilibrium 
situation {E) will move towards that value. The adaptive schedule is imple- 
mented as follows: 

1. Set an initial temperature Tq. 

2. Perform m Monte Carlo steps per ensemble member with the ensemble. We 

will call this a Monte Carlo sweep. 

3. Let (F^)(j) be the ensemble average after the jth Monte Carlo sweep. 

IF (£;)(,) < (f; )(,._!) 

THEN GOTO 2, 

ELSE GOTO 4. 

4. Reset the temperature Tj+i = cTj with c < 1. 

5. IF the maximum number of Monte Carlo sweeps has not been reached 

THEN GOTO 2, 

ELSE end of annealing run. 

Note that even though the temperature is always lowered by a factor c, 
the schedule is not exponential as the number of Monte Carlo sweeps spent 
at each temperature varies. Due to the finite ensemble size, fluctuations will 
mask the “true” behavior of the energy average and our criterion for being close 
to equilibrium will lower the temperature even though some of the ensemble 
members are stuck in local minima. Thus we expect that sooner or later the 
ensemble will fall out of equilibrium. 

From a technical point of view the ensemble version of the simulated an- 
nealing algorithm proceeds by selecting randomly n (due to the size of the state 
space usually) different initial states, where n is the ensemble size. Then all these 
states are subjected to the same annealing schedule and evolve according to the 
Monte Carlo dynamics. 

We implemented the simulated annealing ensemble algorithms on a Parsytec 
parallel computer using the PARIX environment and on high-performance RISC 
workstations. The program consists of n equivalent work processes and a master 
process. The latter is responsible for the input /output, the evaluation of the 
data received from the workers and for their control. Each work process handels 
the Monte Carlo algorithm for one member of the ensemble. So each process has 
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to communicate only with the master process and only once after each Monte 
Carlo sweep. 

The configuration of the processor array was chosen to be a simple tree. The 
master process was placed on the root of the tree. Each of the other proces- 
sors had to accomodate a certain number of work processes depending on the 
ensemble size used and the number of processors available. 
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Fig. 7. The adaptive annealing using the ensemble approach at work. The time (i.e., 
the number of Monte Carlo sweeps) spent at that temperature varies automatically 
during the adaptive annealing process. At lower temperatures the ensemble has fallen 
out of equilibrium. Here the results of several runs are shown 



Figure 7 demonstrates how the adaptive schedule works. It shows for a par- 
ticular optimization problem the number of Monte Carlo sweeps as a function 
of the temperature T. One clearly sees that with decreasing T the time (i.e., the 
number of Monte Carlo sweeps) spent at that temperature increases first, and 
then starts to decay once the trapping of the ensemble members has started. 

Figure 8 [48] shows how during the annealing the ensemble is transported 
towards lower energies. While initially the distribution moves down easily it 
starts to sharpen when it gets close to the ground state. In this example a spin 
glass system was annealed which has an approximate ground state energy around 
^min = —780 in the units used for Fig. 8. The simulated annealing procedure 
was carried out adaptively with c = 0.9 and To = 14.0. 

Finally we remark that in a number of cases the time dependence of the 
moments < (F^bsf — ^mm)^ > of the BSF energy can be reasonably well fitted 
by power laws for a certain time span [48, 49]. This shows again the connection 
to the thermal relaxation in tree models, where these power laws also occur. 
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Energy 



Fig. 8. Energy distribution for a spin glass problem during the annealing run at differ- 
ent times, indicated by the number of Monte Carlo sweeps. The simulated annealing 
procedure was used with an adaptive schedule 



5 Summary 

In this paper we have shown that metastable systems and stochastic optimization 
are fields that are intimitely connected through the concept of a complex state 
space, i.e., a space with a function with many local minima. The dynamics in this 
state space, which is induced by a contact with a heat bath - real or artificial 
- can be analysed by simulations based on the Metropolis algorithm or by a 
Marcov-process description. This was demonstrated with a simple example. 

We have then presented two serious applications: the thermal relaxation be- 
havior of spin glasses and optimized simulated annealing procedures. In the first 
application the coarse-graining of the complex state space leading to a tree struc- 
ture is an important intermediate step. In the second application the need for 
adaptive annealing procedures was pointed out and a first solution was presented. 



Appendix: Examples and Exercises (with S. Schubert) 

As an example for stochastic optimizaton we study the traveling salesman prob- 
lem (TSP). The aim of the TSP is to find the shortest tour connecting a given 
number of points in a x-y plane, which possibly represent towns that a salesman 
has to visit. 

In order to show how this problem falls within the formal framework set up in 
Sect. 2.1. we note that a single tour, i.e., a sequence of towns, constitutes a state 
a in the state space J7. The next step is to define a neighborhood relation. One 
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possible method to create a neighbor of a given tour is to choose two connections 
along the tour randomly, to cut them, and to insert one part of the tour in 
reversed order. Another move class chooses two towns randomly and exchanges 
their position in the tour. Finally the tour length is the equivalent of the energy 
and is then the objective function, which is to be minimized. The TSP is known 
to be NP-hard, and thus stochastic optimization procedures such as simulated 
annealing or threshold accepting [50] have been used to search for the minimal 
tour. 



Simulated Annealing. The program tsp.sa.c shows a simple simulated an- 
nealing (SA) procedure for the TSP. The number of towns is N (take N = 6 
initialy) and they are supposed to be on the circumference of a circle. Thus the 
shortest tour is well known and we are able to compare the result of the program 
with the real minimum state. 

The tour is represented by a one-dimensional integer field tourfi] which 
contains the numbers of towns in the order in which they are visited. 

In the outer loop the temperature is decreased exponentially after each Monte 
Carlo sweep. Here the number of Monte Carlo steps per sweep is set equal to 
the number of towns. 

One Monte Carlo step consists of the following steps: 

1. propose a move; 

2. calculate the cost Ar = r„ew — 'f'oid where r denotes the length of the tour; 

3. draw a random number 2 ;, uniformly distributed between 0 and 1; 

4. accept the new tour if exp{—Ar/T) > z, else reject the new move. 



Threshold Accepting. The program tsp_ta.c shows a simple example of a 
traveling salesman problem using another technique, namely threshold accept- 
ing (TA). TA differs from SA in the decision rule for accepting or rejecting a 
neighbor. In TA a neighbor is accepted if the cost function decreases or if it 
increases by less than a given external parameter, which plays the role of the 
temperature. The computation advantage of TA lies in that no calculation of an 
exponential function and no further random number is needed. 



Exercise. Write a program with many more and randomly spread towns (graph- 
ical output). Try out different annealing schedules (linear, exponential, adaptive) 
and try to find the optimal range for the parameter T. See the self-explaining 
code for the two example programs on the enclosed diskette. 
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Abstract. The purpose of this chapter is twofold; to give an introduction into the 
physics of granular media, emphasizing modern concepts and research topics, and to 
review the models most pertinent to the computer simulation of these phenomena. Soft- 
particle molecular dynamics, event-driven molecular dynamics, the contact dynamics of 
Moreau and Jean, and the bottom-to-top-restructuring model are discussed in detail. 
Their range of applicability is carefully assessed, artefacts are pointed out, and key 
results obtained with the respective methods are presented. 



1 The Physics of Granular Media 

1.1 What are Granular Media? 

Sand, pellets, coal, grains all have an important property: they can flow, for 
example, through hoppers. However, in contrast to a fluid they also form piles, 
in fact compact granular material can be hard like a solid. One doesn’t have to 
worry that one’s child may drown in the sandbox. (It may drown in a silo filled 
with corn however, such accidents have happened.) This duality between fluid- 
and solid-like behavior has been the reason why granular media have been of 
central technological importance since the beginnings of civilization: solids can 
only be processed in the granular form. The term “granular media” is also used, 
for example, for suspensions (grains in fluid) or pastes (cohesive). However, in 
this chapter I want to restrict myself to dry, cohesionless granular media. 

Besides the technological importance and the exotic phenomena, some of 
which will be described below, there are basic physical properties which make 
granular materials an interesting research subject [1, 2, 3, 4]. Important key 
words are disorder, threshold dynamics and dissipative interactions, and their 
consequences, stick-and-slip motion, pattern formation, and fluctuations on var- 
ious scales. In particular the passage from microscopic laws to macroscopic be- 
havior, i.e. the prediction of characteristic length and time scales pertinent to the 
dynamic behaviour still poses many fundamental questions. Many of these ques- 
tions became tractable only because of the new research tool: high performance 
computers. 

It is instructive to define granular media in terms of characteristic energies 
(see Table 1). For clusters up to ca. 10^ molecules (such as soot particles), the 
thermal energy at room temperature is more important than gravitational energy 
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Table 1. Characteristic properties of granular media 





radius [m] 


number of 
molecules 


molecule 


0 

1 

O 


1 


soot particle 


10”® 


10® 


dust, powder 


10”® 


10^^ 


sand 


10”^ 


10'® 


gravel 


10”^ 


10^^ 



differences on the length scale of the cluster. For granular media this is not the 
case. Typical Van der Waals energies are of the same order as room temperature 
so that cohesion also becomes weak for particles larger than 1 ^m. However, this 
estimate is rather rough as other attractive forces between the particles may 
become important. Cohesion is not negligible if powders are a little wet, or even 
among pebbles, if they are covered with glue! 

Particles in a granular medium interact when they collide. (Long-range in- 
teractions due to electrostatic charging will not be considered here.) For the 
dynamics of granular media it is essential that the interaction is dissipative and 
nonlinear. The dissipation is due to viscoelasticity (plastic deformation) and to 
Coulomb friction, caused by the surface roughness of the grains; 

Ft = -fidFnSign{vt), for vt ^ 0, (1) 

where /id is the coefficient of dynamic friction, the normal force at the contact, 
and Vt the relative tangential velocity of the two particles. The nonlinearity is 
also of twofold origin: the Coulomb law is discontinuous at zero velocity. In fact, 
the static friction force PsFn with /ig > /id has to be overcome to trigger relative 
motion of the particles in contact. Moreover the elastic restoring force leading 
to a reversal of normal (and to some extent also tangential) relative velocities 
depends on the shape of the particles and is in general different from the linear 
Hooke’s law. 



1.2 Stress Distribution in Granular Packing: Arching 

In 1852 Hagen [5] discovered that, as an immediate consequence of Coulomb 
friction , the pressure at the bottom of a container filled with sand is essentially 
independent of the filling height. This phenomenon is due to so-called arching, 
which means that forces are transmitted to the side walls such that lower parts 
of the filling do not have to carry the weight of the parts above. This is the 
reason why the fiow velocity in an hour glass does not depend on the filling 
height. The constant pressure is in marked contrast to an incompressible liquid 
for which the pressure increases linearly. 
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Fig. 1. Two-dimensional static assembly of discs in a horizontal container with three 
fixed walls and one piston, on which a fixed force is applied. Network of contacts with 
non- vanishing normal forces, whose strength is indicated by the width of the connecting 
lines. Contact dynamics simulation as described in Sect. 4. Prom [7] 



The simple derivation of this result [6] for a cylindrical container of radius 
w starts from calculating the pressure increase from the top to the bottom of 
a horizontal slice of thickness Ax. It is due to the weight G = 7Tw‘^mpgAx of 
this slice, where m is the average mass, p the number density of the grains, 
and g the gravitational acceleration. The weight is partly compensated by the 
friction at the container walls. This friction is proportional to the normal force 
Fn = 2'KwAxp{x), where p{x) denotes the pressure a distance x away from 
the surface of the filling. As the static friction force is indetermined within the 
bounds ±/isFn, the value of the proportionality constant p depends on the way in 
which the container was filled. (Actually, in pathological cases, the friction force 
may even have the same direction as G, in which case Hagen’s observation is no 
longer correct. This happens when the packing is under compression stabilized 
by “upside-down” arches.) The pressure change, Ap = {G — pFn)/7TW^ can easily 
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be integrated to give 

p{x) = Poo(l - exp(-2/xx/i«;)), where Poo = (2) 

Near the surface, x = 0 and the pressure increases linearly as in an incompressible 
fluid, however at a depth comparable to the radius of the container the pressure 
approaches a constant value poo • 

This model calculation explains only the height dependence of the average 
pressure. As a granular medium is not an unstructured continuum but a dis- 
ordered discrete packing, pressure fluctuations are significant. Experiments and 
computer simulations show that the forces are concentrated on an irregular net- 
work of contacts (Fig. 1) and actually have a power-law distribution of normal 
(and tangential) components smaller than their average values [7], 

w{Fj^) ~ (in two dimensions). (3) 

For larger contact forces the distribution decays exponentially. 

1.3 Dilatancy, Fluidization and Collisional Cooling 

In flowing granular matter of high density the arches constituting the force net- 
work break and reassemble diflFerently all the time. These processes are governed 
by two basic phenomena, dilatancy and collisional cooling. 

Dilatancy was discovered by Reynolds 1885 who observed that deformation 
of a dense granular packing without an increase of its volume is impossible. The 
particles sit in “cages” formed by their neighbours, and relative displacements 
require that these cages break up, necessarily an increase in the pore volume 
between the particles. The dilatancy phenomenon can be spectacularly demon- 
strated by putting a non- varnished wooden stick into a bottle which is then filled 
with sand. By tapping at the bottle the sand is compactified. If one now tries 
to pull the stick out, the whole bottle is lifted and can be carried around on 
the stick. The reason is that the walls suppress the volume increase necessary to 
allow the shearing exerted by the rough surface of the stick when pulled out. 

In a more general sense the dilatancy phenomenon also occurs in flowing 
granular materials. Consider for example Fig. 2, which shows results of a com- 
puter simulation by Thompson and Crest, 1991 [8]. The horizontal motion of 
the upper wall with a constant velocity U leads to the coexistence of a compact 
lower part of the granular medium between the plates and a fluidized upper part 
that acts as a “lubricant” for the motion of the wall. An increase of U leads to 
the fluidization of more material. For high enough velocities the compact lower 
part will vanish entirely, and the fluidization will be complete. In addition to the 
shear-induced fluidization we observe that the width of the fluidized region is 
proportional to 17, such that the average shear rate dyVx is approximately con- 
stant. As a result, an increased shear rate implies a larger volume. This is due to 
the constant confining pressure on the walls, chosen as the boundary condition 
here. 
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P I U=22.4 





Fig. 2. Two-dimensional vertical system of discs subject to a horizontal shearing. P is 
the pressure at the upper wall, and U its velocity relative to the fixed lower wall. Molec- 
ular dynamics simulation as described in Sect. 2 with linear spring-dashpot model. 
Prom [8] 



In order to keep a fully fluidized granular medium at constant volume, Bag- 
nold (1954) showed that the confining pressure must increase like with shear rate 
squared. This is also true for the shear stress: 

a^y oc {mjd){dyVxfsig^^{dyVx) (4) 

with a dimensionless proportionality constant; m is the grain mass and d is the 
mean distance between the grains (in the two-dimensional system considered by 
Bagnold the factor 1/d is absent in (4)). This shows that a fluidized granular 
medium is a non- Newtonian fluid. In Newtonian fluids the internal friction is 
proportional to the shear rate, 



^xy — V^y^x^ 



( 5 ) 



where rj is the viscosity. 

Bagnold gives a simple derivation for (4), which I rephrase here as a dimen- 
sional argument: axy has the dimension of a force per area, i.e. [mass/time^ 
length] and describes the momentum transfer per unit time and unit area be- 
tween adjacent volume elements. Now Bagnold argues that the momentum trans- 
fer is exclusively due to particle collisions induced by the shear rate dyVx with 
the dimension of an inverse time. Let us assume that the distance d between the 
particles and the particle radii R are of the same order of magnitude. The only 
dimensional parameters which can possibly enter the problem are therefore m, 
d and the shear rate. Equation (4) is the only combination of these quantities 
with the proper dimensions. The proportionality constant will depend on the 
dimensionless ratio R/d. Why does this argument not hold for molecular fluids, 
where one finds (5) instead? The reason is that collisions of the molecules are 
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mostly due to their thermal motion. The thermal energy therefore is another di- 
mensional quantity which must enter the argument. Thus, a simple dimensional 
argument is no longer possible.^ 

As soon as one stops agitating the fluidized granular medium by vibration or 
shearing, it “freezes” into a metastable state. Two facts are responsible for this: 
first the negligibility of the thermal energy of room temperature compared to 
the energy barrier, if one wants to move a grain on the surface of a dense packing 
from one local minimum to the next, and second collisional cooling. The term 
“cooling” here refers to granular temperature rather than to the thermodynamic 
one. By analogy with the equipartition theorem in thermodynamics, the notion 
of granular temperature was introduced [10] to characterize the mean square 
deviation of particle velocities from their average: 

Tg = (V^) - (V)^ . (6) 

Hence, in a static pile Tg = 0, whereas in the fluidized state Tg / 0. After each 
collision the absolute value of the relative velocity between the two particles is 
reduced, because the dissipative nature of the interaction converts granular into 
thermodynamic temperature which henceforth is irrelevant for the dynamics of 
the granular medium. This is what is meant by collisional cooling. 

In granular flow through a vertical pipe the collisional cooling leads to clus- 
tering of particles [11] in the pipe. For a sufficiently high density this can support 
the formation of arches, even in the absence of, for example, air flowing around 
the particles. The resulting kinematic waves are supposed to be analogous to 
spontaneous traffic jams on highways [12]. 

In this section I have discussed so far two flow regimes of granular media, one 
for high density (Hagen) and one for medium density (Bagnold). As a summary 
of this section I want to illustrate how they differ in the case of a vertical pipe. 
It is instructive to remind oneself first of the behaviour of a Newtonian fluid. 
In the steady state the divergence of the stress tensor (5), which is the internal 
friction force on a volume element, has to balance the gravitational acceleration, 

pmg = dyaj,y, (7) 

where x is the vertical direction. Inserting (5) one finds that dyVx is constant. If v 
is the average velocity in the pipe and w its diameter, then the second derivative 
of Vx is of the order of magnitude Hence, we And the well-known result of 

Hagen-Poiseuille flow, that 

vocw^. ( 8 ) 

Granular flow has a much weaker dependence on the pipe diameter. First, 
for high density, gravitational accelerations are balanced by arching, which re- 
places the viscous internal friction (5). The physics of this balance are entirely 

^ A simple dimensional argument is, however, possible if one postulates that the vis- 
cosity of molecular gases does not depend on the shear rate. Then one obtains 
T] oc \/mkBT /E? [9]. 
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determined by the volume fraction of the grains (dimensionless), the Coulomb 
friction coefficient (dimensionless), the diameter of the pipe and the gravitational 
acceleration. Again applying dimensional arguments, the average flow velocity 
must be 

v(x{gw)^^‘^. (9) 

Second, in the fluidized regime, when Bagnold’s law (4) describes the internal 
friction which balances the gravitational acceleration, application of (7) shows 
that 

vocw^^^. ( 10 ) 

Recently we [13] discovered a third regime for low density in which the bal- 
ance of gravitational acceleration involves the production of granular tempera- 
ture due to particle collisions with the wall. The steady state is reached when 
the production of granular temperature is balanced by the collisional cooling. In 
this regime, 

V ocw. (11) 

This analytical prediction has been confirmed by computer simulations for which 
we used the event-driven algorithm described in Sect. 3. 

1.4 Stick- and-Slip Motion and Self- Organized Criticality 
(with S. Dippel) 

Coming back to Fig.2, the question arises, what are the dynamics close to the 
interface between the compact and the fluidized regions. One way to get insight 
into this is to relax the constraint of fixed speed U of the upper plate by pulling it 
slowly with a spring. Then one gets stick-and-slip motion (Fig. 3). As long as the 
upper plate does not move, the spring tension increases linearly with time until 
a threshold is reached at which the plate starts sliding. The friction force exerted 
by the granular medium on the upper plate is smaller than the threshold force 
once it is in motion. Hence the plate is accelerated so that the spring tension 
drops sharply. However, then the friction wins and reduces the velocity of the 
plate to zero, and the plate sticks again until the spring tension exceeds the 
threshold the next time. 

Stick-and-slip motion is usually viewed as the consequence of a static friction 
coefficient that is larger than the dynamic one. However, the results in Fig. 3 
were obtained with a Molecular Dynamics simulation without any explicit im- 
plementation of static friction. In granular media the threshold dynamics leading 
to stick-and-slip phenomena can have a variety of origins. In the present exam- 
ple it is the dilatancy threshold which has to be overcome in order to move the 
upper plate: Fig. 3 shows that the distance between the plates is larger when 
there is motion and that gravitation and collisional cooling let the system freeze 
into a compact state when the upper plate is at rest. By contrast, in the case of 
static solid friction, microscopic asperities of the surfaces are interlocked. The 
simulation of Thompson and Grest in a way provides a macroscopic metaphor 
for this. 
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Fig. 3. Stick-and-slip motion: Time dependence of (a) the force per unit length, /, 
with which the top wall is pulled, (b) displacement of the wall, and (c) spacing 
between the walls, /i, all in natural units {d is the grain diameter). From [8] 



The regularity of the stick-slip sequence is due to the fact that the threshold 
dynamics manifests itself only in a single degree of freedom, the position of the 
upper plate. Dropping this restriction, an important concept was introduced by 
Bak, Tang and Wiesenfeld 1987 [14]: self-organized criticality. Without going into 
the details this concept can be illustrated in the present context by imagining 
the upper plate in Fig. 2 consisting of many small elements coupled by springs. 
Then the system develops a non-trivial distribution of tensions among all the 
springs. Therefore, if one element starts sliding, it causes neighbouring elements 
to exceed the threshold, too, starting an “avalanche” which has been proposed as 
a model for earthquakes [15]. Self-organized criticality means that the avalanches 
have no characteristic size, i.e. their size distribution is a power law. Introduc- 
tory reviews on the concept of self-organized criticality can be found in [16]. 
Originally, sand piles were proposed as a paradigm for self-organized criticality 
[14]. Adding grains to a pile, the surface becomes steeper and steeper until an 
avalanche starts. It turned out, however, that these avalanches have a power-law 
distribution only if inertia effects are avoided [17]. 



1.5 Segregation, Convection, Heaping (with S. Dippel) 

Geologically most solid particles on earth were originally rocks. Sand is the result 
of a long fragmentation process. However, as Kolmogorov showed 1941 [18] in 
a study of the statistics of crushed-ore sizes, a fragmentation process usually 
leads to a much broader size distribution (ideally a log-normal distribution). 
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Fig. 4. Kolmogorov model of fragmentation 



than that which we know from, for example, beaches. Figure 4 shows such a set 
of fragments with a log-normal size distribution. It was obtained by dividing a 
square by a straight line at an arbitrary position into two parts, then every part 
again into two parts and so forth. 

Therefore the question arises: what is nature’s sieve? Most importantly, na- 
ture separates particle sizes by selective transport by water and wind, and by 
subsequent sedimentation. However, even without interaction with a hydrody- 
namic medium, size segregation occurs in granular media. Three mechanisms 
are familiar from everyday experience. First, small grains can percolate under 
the influence of gravity downward through the gaps of a stable packing of larger 
particles [19]. Second, in granular flow along rough surfaces small particles are 
more easily trapped than big ones [20, 21, 22, 23]. This is why, for example, in 
rockslides and on mountain slopes the bigger rocks accumulate at the bottom of 
the slope. It also explains the radial segregation in a drum that rotates about a 
horizontal axis [21, 24, 25]. If it is less than half filled with a mixture of small 
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Fig. 5. Rotating two-dimensional drum with 2512 large {white) and 2483 (black) discs 
with half the diameter, after three total revolutions. Simulation with the BTR model 
described in Sect. 5. 2. Prom [59] 



and big particles of equal mass density, the large ones accumulate in the outer 
regions within less than a full rotation (Fig. 5) . 

A more intriguing (and also more controversial) mechanism of segregation can 
be observed when a box filled with particles of different sizes is shaken vertically. 
It has been termed “Brazil nut effect” [26, 27, 28], as in Brasilian fruit trucks the 
big nuts were found on top of the smaller fruit after transportation. Recently, it 
has been shown that in many cases this segregation is intimately connected to 
another phenomenon occuring in vibrated granular materials: convection. When 
rough particles in a container with rough walls are vibrated, a flux of particles in 
the middle of the container moves upwards, forming a heap on the surface (see 
Fig. 6), while at the side walls particles are dragged down into the bulk. However, 
the zone in which downflow occurs is quite narrow so that large particles having 
been carried up to the top of the pile by the wide upward stream cannot enter 
it again, thus staying on top [29, 30, 31]. 

However, if the box is not vibrated vigorously enough for convection to set in 
or to extend throughout the whole depth of the container (as it usually starts up 
in the higher, more easily fluidized regions of the piling), other mechanisms seem 
to be at work. Due to the vibrations, small local rearrangements of the particles 
can take place. A small particle can slip underneath a big one leading to the 
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Fig. 6. Size segregation, convection and heaping in a two-dimensional packing [61]. 
Simulation with the BTR model described in Sect.5.1. Implementation of wall friction 
as in [58]. Ratio of particle radii is 13. Left: after 60 shakes. Right: after 180 shakes 



upward motion of the big particle. This is more likely than a well-concerted 
giving way of all the small particles on which the big particle rests, which would 
be needed for downward motion. The character of the ascent of the big particles 
was found to depend on the ratio of the big, and the small radii [27]. In two 
dimensions, if this ratio is smaller than about 12.9 [32, 28] a single large disc 
rises intermittently, depending on the amplitude of the shaking, whereas for 
larger ratios it rises continuously in every shaking cycle. 

Among the less well understood phenomena related to heaping, segregation 
and pattern formation in general, we would like to mention two which are partic- 
ularly intriguing. First, the dynamics of band segregation one observes in three 
dimensional drums (see for example [33]) and second, the cellular patterns one 
obtains, if a thin layer of grains is vibrated vertically, reminiscent of Rayleigh- 
Benard convection [34]. 
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2 Molecular Dynamics Simulations I: Soft Particles 

2.1 General Remarks 

In molecular-dynamic simulations Newton’s equations of motion 

N 

miTi = '^Fij ( 12 ) 

are discretized and solved numerically to give the time evolution of a system of 
N particles. It should be borne in mind that this method is based on a model 
of the forces F, so that the results have a range of validity which needs to be 
assessed carefully in order to avoid wrong interpretations. The computational 
challenge of this method lies in the fact that the hydrodynamic and agitation 
time scales which one wants to investigate are vastly larger than the duration r 
of a collision or the time between two collisions. 

In this section I use a fixed discretization time step At, which should be 
about 10“^T, to keep relative errors of physical quantities integrated over the 
whole collision time of order 10“'^. In contrast to Lennard-Jones fluids, where 
the characteristic time r ~ 10“^^s, the collision time here is typically of the 
order r ~ 10“^s. 

As a result of the integration over one time step two grains may turn out to 
overlap. Within the soft-particle model [35, 36] of granular media this overlap 

^ = Ri Rj — \Tij\ (13) 

(for spherical particles) is physically interpreted as the elastic deformation in 
the collision between the two grains i and j. Here Ri denotes the radius of the 
i-th particle, and Tij is the difference between the centre of mass positions of the 
collision partners. The forces depend on ^ and the tangential component of the 
relative velocity, as explained in the next section. 

For integrating (12) we have used the Gear predictor-corrector scheme of fifth 
order [37]: in every time step one calculates the position, velocity, acceleration 
and the 3rd and 4th time derivative of the position of each particle. The Taylor 

expansion of these five quantities gives a prediction Tp\ i = 0,...4, for their 

f 2) 

values after the next time step. The predicted acceleration ^ is now compared 
with the force (divided by the mass) calculated for the predicted positions and 
velocities. The difference At^P is used to correct the predicted values: 

r« =rW+CiZ\42)_ (14) 

The coefficient cq depends on whether or not the forces depend on the velocities. 
Here they do, so that the coefficients are (cq, ..., 04 ) = (19/90,3/4, 1, 1/2, 1/12). 

The reviews [38] and [39] contain many examples of where this type of simu- 
lation has been applied. Also the simulation results of Thompson and Grest [8] 
mentioned in Sect. 1 have been obtained with this model. The next few sections 
will address questions of implementation and validity of soft-particle molecular 
dynamics in the context of granular media. 
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2.2 Normal Force 

Among the various different implementations used in the recent literature I want 
to discuss only those of the form 

= (15) 

The first term is the elastic restoring force and the second term describes vis- 
coelastic dissipation. One has to imagine that (15) is the result of an integration 
over the contact area, which increases with Therefore, even if the material of 
the colliding particles is described by linear elasticity, in general the first term 
deviates from Hooke’s law a = 1. Hertz (1882) showed that for elastic spheres 
a = 3/2 [40]. 

It is clear that a = (3 if the material of the colliding particles obeys linear 
viscoelasticity [41] . Then the stress tensor is equal to a sum of two terms, one 
proportional to the strain tensor and one proportional to its time derivative. The 
integral over the contact area is then a corresponding sum of two terms, whereby 
the second is proportional to the time derivative of the first one. Nonetheless, 
frequently a = 3/2 is used together with /3 = 1; however, we shall see that some 
results are only applicable to rather exotic materials with nonlinear viscoelastic- 
ity. 

As explained in the previous section we need to know the collision time in 
order to choose the time step of the simulation appropriately. The collision time 
for the normal force (15) can be estimated by a simple dimensional argument. I 
neglect the damping term which can only increase the collision time. Then the 
equation 

C = -{k/m)e (16) 

has to be solved with initial conditions ^(0) = 0 and ^(0) = Vn, the normal 
component of the relative velocity before the collision. The two parameters k/m 
and Vn can be combined in unique ways to give a characteristic time and a 
characteristic length, which have to be of the same order of magnitude as the 
collision time and the maximal overlap ^max, respectively: 

T~Cmax/Vn, Cmax ~ (17) 

For a ^ 1 the collision time depends on the impact velocity. Therefore the 
simulation time step has to be chosen in accordance with the relative velocities 
to be expected during the simulation. 

The dissipation in a head-on collision is characterized by the coefficient of 
normal restitution, an important material parameter, which is defined as the 
ratio between the final and the initial normal component of the relative velocity, 

= (18) 

It varies between 0 for completely inelastic and 1 for perfectly elastic collisions. 
Experimentally one finds for a large class of materials [42, 43, 41, 44] that the 
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restitution coefficient decreases with increasing i.e. the more violent the im- 
pact, the more dissipative it is. 

With (15) and (17) one can easily estimate the energy dissipation in one 
collision, and hence the restitution coefficient. The typical value of the damping 
force is dissipated energy gets an extra factor ^max? 

EU) _ £(i) ^ -7^^+1/r. (19) 

With = mvn^^j2 the restitution coefficient can be expanded for weak dissi- 
pation as 

e„ = 1 + (_£(/) _ i;(d)/2£;(d. (2Q) 

Inserting (19) and (17) results in 

1 — Cn (21) 

Figure 7 shows 1 — 6n as a function of in a double logarithmic plot [45]. The 
data were obtained by implementing (15) for a = /? = 1 (linear spring-dashpot 
model), for a = /? = 3/2 (Hertz-Kuwabara-Kono model), and for a = 3/2, 
P = 1 (Hertz linear dashpot model). The results are in very good agreement 
with (21). 




v;’ [m/s] 



Fig. 7. Log-log plot 1 — en versus the normal component of the relative velocity. 
a = p = 1 (circles), a = p = 3/2 (triangles), and a = 3/2, p = 1 (squares). The 
power laws predicted by (21) are indicated by the lines 



The Hertz linear dashpot model gives the result that Cn decreases with Vn, 
i.e. more violent impacts are more elastic. The same is true whenever P < (a -h 
l)/2, according to (21). Materials are known which become more elastic for 
high frequencies and are easily deformed plastically at low frequencies. Equation 
(reftau) implies that the collision time decreases with increasing impact velocity, 
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as a > 1 normally, and hence the collision may well become more elastic for those 
“exotic” materials. This does not apply, however, to most “normal” materials 
like metals, glasses, hard plastic etc. Nevertheless the Hertz linear dashpot model 
has been used in many recent simulations. The results should reflect the correct 
physical behaviour of normal materials, provided the relative velocities do not 
vary over a wide range. On the other hand, all these simulations show a tendency 
towards fluidization . One of the reasons for this could be that large relative 
velocities are damped out very slowly and contribute strongly to the granular 
temperature. Other reasons will be discussed in sections 2.4 and 2.5. 

In any model with a = /3 > 1 large relative velocities are damped out faster 
than slow relative velocities, and this trend is the stronger the larger a is. For the 
linear spring-dashpot model the restitution coefficient does not depend on the 
relative velocity. Together with the fact that the collision time does not depend 
on the velocity either, and that the linearity makes this law very well behaved 
numerically, the simple linear spring-dashpot model is a good compromise be- 
tween physical accuracy and numerical efficiency. However, there are physical 
phenomena for which the nonlinearity of the Hertz law is important. The global 
elastic properties of a granular packing are one example [46]. In a dense system 
the overlap depends on the pressure. Linearizing the Hertz law for small fluctu- 
ations of the overlap around a non-zero average shows that the effective spring 
constant increases with the external pressure. This explains why shock waves 
spread faster through a granular medium the more compressed it is. 

2.3 Tangential Force 

The implementation of dynamical friction (1) is straight forward. Nevertheless 
two remarks are in order. First, it should be borne in mind that the different 
implementations of the normal force Fn also influence the friction behaviour. 
Second, Vt is not the tangential component of the relative velocity of the centres 
of mass of the colliding particles, but that of the relative velocity at the particle 
surfaces at the point of contact. It contains contributions of the particle rotations. 
If two circular discs (i = 1,2) with radii Ri and angular velocities ui (positive for 
counter clockwise rotation) collide in a two-dimensional system the tangential 
velocity is given by 



Vt = (V2 — Vi) • t + UJiRi -h CJ 2 R 21 (22) 

where t denotes the unit tangential vector. The equation of motion for the centres 
of mass (12) has to be supplemented by that for the spin of the particles 

uJi = RiFt/Ii, (23) 

where R denotes the moment of inertia. 

Figure 8 shows the final tangential velocity as a function of the initial tangen- 
tial velocity [45], both normalized by the initial normal velocity for = 0.25, 
and compares the curves with experimental data obtained for cellulose acetate 
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Fig. 8. Final versus initial tangential velocity, both normalized with the initial normal 
velocity. Dashed line: simulation with (1) [45]. Experimental data from [47] 



spheres [47, 48]. The behaviour for sufficiently large initial tangential velocities 
can be well fitted by adjusting However for small values the dynamic fric- 
tion simply reduces the tangential velocity to zero (strictly speaking the finite 
simulation time step always leads to an overshooting so that the final velocity 
has tiny oscillations around zero, not observable in Fig. 8). In the experiment, 
by contrast, the tangential velocity is reversed. The physical reason is a tan- 
gential elastic restoring force which builds up in a sticking contact and hence is 
connected with static friction. 

The simplest implementation of tangential elasticiy is due to Cundall and 
Strack (1979) [35]. As soon as two particles touch {t = to), an imaginary spring 
is attached to the points of contact. During the collision the tangential displace- 
ment C of ffio points of first contact is recorded and identified with an elastic 
stretching of the spring, 

t 

vt{t')dt^. (24) 

0 

The friction law (1) is then usually replaced by 

Ft = -mindFCI, |/XdF|)sign(C). (25) 

Figure 9 shows that with this model one can fit the experimental data. 

In order to explain what (25) means, let us consider a collision with initial 
tangential velocity vt. First the imaginary spring gets stretched, and if the tan- 
gential velocity reaches zero before |Ft| reaches the contact should be 

viewed as nonsliding. Then one gets an (undamped) fictitious tangential oscilla- 
tion as long as the contact exists. As soon as the particles loose their overlap due 
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v“Vv 



Fig. 9. Same as Fig, 8. Simulation with (25), kx/kn = 1/5 [45] 



to normal motion, the spring vanishes. Depending on the phase of the tangen- 
tial oscillation when this happens, the final tangential velocity can be reversed 
compared to the initial one. If the contact persists, the tangential spring can 
resist a finite external tangential force and thus models static friction. If \ktC\ 
exceeds \^d^n\ the force no longer increases. In this case one has to distinguish 
two scenarios. If the tangential velocity never reaches zero before the two parti- 
cles separate again, the energy spent to stretch the spring is lost and the contact 
had been sliding. Note that there is no difference between the static and dynamic 
friction coefficient in this model force. The second scenario is that the tangential 
velocity reaches zero while the contact still exists. Then the particle starts an 
imaginary oscillation as above and the contact is interpreted as sticking. How- 
ever, now the tangential oscillation is not harmonic: as long as | Art Cl > \P'dFn\ 
the force does not depend on C- Note that in this case Ft = — |)Li(iFn|sign(C) 
cannot be interpreted as sliding friction, which never would lead to a reversal 
of tangential motion, and the work done by this force is not dissipated. Again 
the tangential velocity at the end of the collision may be opposite to its initial 
direction. 



2.4 Detachment Effect 

Equation (17) shows that the collision time r and hence the simulation time 
step is proportional to As the phenomena one is interested in, such 

as segregation or convection, take place on much larger time scales, one common 
simulation strategy has been to choose k much smaller than in real materials, 
thereby allowing larger time steps. This seemed to be a good trade off for the 
artificially enlarged collision times. The restitution coefficient can be kept at a 
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realistic value by choosing 7 accordingly. 

Lading [49] showed that this strategy only works for low densities, i.e. in 
the fully fluidized state. If the collision time is enlarged so much that it becomes 
comparable to the time between successive collisions (still much shorter than the 
times one is interested in) then the dynamical behaviour of the system changes 
drastically. Instead of having many successive binary collisions, one gets many- 
particle collisions in which clusters of several particles overlap at the same time. 
Lading argues that this reduces dissipation enormously. 

He considers a simple example of a one-dimensional equidistant arrangement 
of N grains that move all with the same velocity v towards a plate from which 
they are going to be reflected. If the distance is much smaller than vr then there is 
essentially a single multi-particle collision of all grains. The whole cluster can be 
viewed as a single elastic particle with a total deformation iV^max • The collision 
time of the cluster is then N times the collision time of a binary collision. For the 
linear spring-dashpot model this implies that the dissipated energy per particle 
is the same as in a single binary collision. By contrast, if the distance between 
the particles initially was much larger than vr then one gets a sequence of about 
iV ^/2 binary collisions until all particles are reflected. The dissipated energy per 
particle is significantly increased. 

This example shows that dissipation is suppressed in dense systems if one 
artificially increases r in order to be able to simulate larger time intervals. This 
leads to unrealistically strong fluidization. Event driven simulations (see Sect. 
3) avoid the detachment effect. 

2.5 Brake Failure Effect (with J. Schafer) 

The molecular dynamics scheme described in this section gives rise to an artefact, 
the so-called brake failure effect. Consider two equal spheres of radius R that 
collide with initial relative velocity under an angle We choose cartesian 
coordinates such that the x-axis is in direction of . 

A braking function can be defined as the change of Vx caused by the inter- 
action between the colliding spheres, 

Avx oc f Fx{t)dt, (26) 

"^^cont 

where tcont is the time for which the spheres overlap. If they could freely pene- 
trate each other without feeling any interaction, they would overlap for a time 

To =4i? cos (27) 

As long as T To, the simulation gives correct results: the contact time is 
determined by the collision time (17), tcont ^ Then the spheres are reflected 
from each other, i.e. 

Avx oc 



(28) 
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with geometrical factors depending on 'd. The braking function increases linearly 
with the initial velocity. 

However, for tq <C r, the braking function behaves unphysically. The dura- 
tion of contact becomes tcont = tq. The integral (26) can be approximated by 
J Fx{t)dt « Fa; To, where is a mean force in x-direction. Using (27) we ob- 
tain: Avx oc This decrease of the braking function with increasing initial 

velocity is meant by the term brake failure. 

The transition between both regimes takes place at tq/t « 1, such that we 
obtain a critical velocity for brake failure Vc'. 



Vc 



4R cos 
r 



(29) 



being smaller the more oblique the impact and the higher the collision time r 
is. The brake failure effect therefore can be stated in the following terms: Time- 
step-driven simulations of collisions exhibit an unrealistically small dissipation 
when the impact velocity exceeds the critical velocity Ve- 
in principle this may even happen for a frontal collision t? = 0 if the im- 
pact velocity is so large that the maximal overlap ^max > 4 jR. For a reasonable 
simulation setup this can be avoided. But as one approaches grazing incidence, 
= 7 t/ 2, brake failure is bound to set in. To illustrate this effect, Fig. 10 shows 
the braking function for two spheres. 

A possible artifact due to brake failure is bistability in simulated granular flow 
through a vertical pipe [51]: instead of a unique steady-state velocity one obtains 




Fig. 10. Brake failure: for velocities larger than Vc collisions have vanishing effect. The 
relative change of smaller velocities decreases the more grazing the collision is. Prom 
top to bottom: 'd = 64.2°, 71.8°, and 77.2°. Different symbols indicate data collapse for 
various different collision times at fixed Prom [50] 
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a physical or an unphysical steady state depending on the initial conditions. Like 
the detachment effect, brake failure is avoided in event-driven simulations (next 
section) . 

3 Molecular Dynamic Simulations II: Hard Particles 
(with J. Schafer) 

3.1 Event-Driven Simulation 

In systems in which the particle contacts can be considered as instantaneous 
binary collisions and only the contact forces act between the grains, the evolu- 
tion of the system need not be integrated numerically: between collisions, the 
analytical solution of the equations of motion can be used. Then two tasks re- 
main to be done: the calculation of the time of the next collision in the system, 
and the definition of a collision operator that describes post-collisional velocities 
and angular velocities as a function of the pre-collisional ones and the impact 
geometry. The former can be done using simple algebra and some intelligent 
bookkeeping [37]; the latter involves some definite (and simplified) model of the 
physics of colliding bodies. 

We use the assumption that a collision of two grains can be entirely described 
by a coefficient of normal restitution Cn plus two coefficients related to tangen- 
tial motion: the coefficient of friction [i, which applies for high impact angles 
where throughout the collision dynamic friction occured, and the coefficient of 
tangential restitution which applies for low impact angles . On the basis of this 
simplified picture, the following collision operator can be derived [52, 47]. 

3.2 Collision Operator 

The collision operator is completely determined if we know the momentum 
change J that occurs during the contact. Decomposing J into normal and tan- 
gential components, Jn and Jt, we have for the difference of final and initial 
velocities of grains i and j {Avi =v{—vl etc.): 

rriiAvi = —rrijAYj = J (30) 

{li/Ri) Aui = {Ij/Rj) Auj = Jt. (31) 

Here, R is the radius and J = 2jhmR? the rotational inertia of the grains. The 
relative velocity at the contact points has the form 

V = Vi - Vj -f [RiUi + RjUj)t. (32) 

Using (30) and (31), we obtain: 

= v-^ - V* = + fm^Vtt. 



(33) 
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The three coefficients that describe the impact are the coefficient of normal 
restitution the friction coefficient ^ = 1 Jtl/l^/nl, and the coefficient 

of tangential restitution et = v( /vl. Using these definitions together with the 
normal and tangential component of Eqn. (33) yields the following components 
of J: 

Jfi ~ ?7leff(l + en)Uj^5 



= -/LlTTleff (1 + ea)vll-ij}\ 



and 



T(et) _ 2 
Jt — ^ 



f meff(et - l)v^ 



(35) 



(36) 



Here, is a parameter measuring the obliqueness of the impact. The 

transition between the cases (35) and (36) occurs when they take equal values, 
i.e. at an obliqueness of 



^0 



= 1^ 



1 + 

1 - et 



(37) 



3.3 Limitations 

When particle densities are low, e.g. in a dilute gas, the event-driven simulation 
scheme works very fast compared to the time-step-driven method. However, the 
explicit determination of the time of the next collision is the most expensive part 
of the calculation, and for dense systems in which the frequency of collisions 
is high, the method becomes less advantageous. A real limitation arises from 
the assumption of instantaneous impacts. The dynamics of dissipative systems 
such as granular materials tend to produce clustered states [11, 53], and in this 
situation long-lasting contacts, not contained in the concepts of the method, 
are frequent. Much as Achilles in Zeno’s famous paradox can never overtake the 
tortoise, the simulation time then converges towards the time when a long-lasting 
contact closes, whereas the collision frequency between the respective particle 
pair diverges. This is called “inelastic collapse” [53]. In principle, an algorithm 
handling clusters as a distinct class of particles could circumvent this problem; 
such a routine exists for ID systems [54], but its extension to two dimensions is 
non-trivial and has not yet been undertaken. 

4 Contact Dynamics Simulations 
(with L. Brendel and F. Radjai) 

4.1 General Remarks 

One of the most interesting theoretical challenges is to understand the emergence 
of characteristic lengths and times relevant for the collective behaviour of a 
packing with lasting contacts from microscopic contact interactions. One simple 
example is an array of parallel cylinders of equal radius i? on a horizontal plane 
(Fig. 11). It is pushed with a constant force by a block and terminated by a 
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Fig. 11. System with many lasting contacts, showing self-organization of rotational 
degrees of freedom 



similar block, such that the cylinders stay in contact with each other and with the 
plane. The important feature of this arrangement is that the rotational degrees of 
freedom are frustrated. This is illustrated in Fig.l2: because of Coulomb friction, 
a contact wants to be rolling {vt = 0) instead of sliding (t^t ^ 0), i.e. neighbouring 
cylinders want to have opposite spin. (Spin here is meant in the mechanical 
sense, but the analogy with an antiferromagnet is perfectly correct.) If there 
are three contacts forming a closed loop, at least one of them has to be sliding. 
The question then arises, how do the rotational degrees of freedom organize 
themselves into a collective structure in a simple situation like Fig.ll. 

On a macroscopic level the whole array can be viewed as one “block” slid- 
ing over the plane. Although on this level the rotations are internal degrees of 
freedom of the block, they do have important implications for the macroscopic 
friction which is no longer a Coulomb law. Another example is the calculation 
of the force network as in Fig.l. Clearly an accurate implementation of static 
friction is needed to study these questions. 

Reading Sect. 2.3 about the implementation of Coulomb friction in the soft- 
particle model one can hardly avoid some feeling of dissatisfaction about the 
treatment of static friction: it is really rather ad hoc, the representation of mi- 
croscopic dynamics by the imaginary tangential spring seems a bit unnatural and 
is only justified by the results at the macroscopic level. Lasting contacts involve 
undamped fictitious oscillations on the microscopic level. In the event-driven 
algorithm (Sect. 3) lasting contacts cannot be treated at all. In this section an 
exact implementation of Coulomb friction will be described, based on the work 
of J. J. Moreau and M. Jean [55, 56]. 




Fig. 12. Frustration of rotational degees of freedom in the presence of friction 
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4.2 Contact Laws and Equations of Motion 

Some basic notions of different types of contacts are needed to formulate the 
algorithm. First the classification according to the tangential velocity and accel- 
eration: 

— A contact is called sticking if ut = =0. The tangential force at such a 

contact can be anything between —fjtsFn and Its precise value depends 
on the other forces acting on the particles in contact: the static friction force 
has to compensate them such that Vt is indeed zero. 

— A contact is activated if Vt = 0, but vt / 0. Then Ft = — /isFnSign(ut). 

— A contact is sliding if Ut ^0. Then Ft = — /XdFnSign(ut). 

The allowed combinations of (t>t, Ft) for the case t?t = 0 are plotted in Fig. 13, the 
Coulomb graph. Because of the multi-valuedness at i?t = 0 this is not a function. 




Similarly for the allowed combinations of (t)n,Fn) for = 0 one obtains the 
so called Signorini graph (Fig. 14) if one considers perfectly rigid particles. One 
distinguishes the following: 

— Closed contacts; Vn = Vn = 0. For them the normal force can have any value 
> 0. The value adjusts itself such that it compensates all forces which would 
lead to an interprenetation of the particles. 

— Opening contacts; = O^Vn > 0- For such a contact Fn has to be zero. 

Finally, if the normal component of the relative velocity Vn ^ 0, there cannot be 
a contact between the perfectly rigid particles. 

The equations of motion give in addition a dynamic relationship between the 
accelerations and forces. For every contact the relative normal and tangential 
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Fig. 15. Two examples, where Un = Fn = 0 is violated for the contact between the 
middle particle and the support; see text 



accelerations are given by 

'^n = — (38) 
m* 

The first terms on the right-hand side are the usual two body interactions with 
the reduced mass 1/m* = 1/mi + l/m 2 , and {E? jl)* = Rl/h -h R 2 /I 2 , where 
mi, Ri and R are the masses, radii and moments of inertia of the two particles 
in contact. and At are accelerations due to contacts with further particles 
or due to external forces such as gravity. They are linear functions of the forces 
acting at the other contacts. 

Without An the only solution (t?n, F„) of (38) allowed by the Signorini graph 
is Un = Fn =0. However it is easy to imagine situations in which Fn = 0 and 
zfi 0. Consider, for example, three cylinders as in Fig. 15. In the left example 
the middle one is accelerated upward due to the frictional force exerted by the 
rotating cylinders to the left and to the right. The normal force at the opening 
contact with the plane is zero. Similarly, the right example shows a situation in 
which the normal force at the contact between the middle cylinder and the plane 
is directed upward but does not lead to an acceleration because it is compensated 
by the weight. 
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Fig. 16. The line (1/m* + {B? jl)*) ^{vt — At) has two intersections with the Coulomb 
graph: the tangential acceleration at the contact in indetermined. It can be zero or 
negative in this example 



In general the intersection between the linear function Fn(?)n) = m*(?)n — An) 
and the Signorini graph gives a solution for and i)n simultaneously. Similarly 
the intersections of the corresponding linear function Ft{vt) obtained from (39) 
with the Coulomb graph are solutions for Ft and Vt simultaneously. In fact, 
as An and At depend on all the other contact forces, one has to determine 
{Fn,Vn, Ft,vt) for all contacts simultaneously. One important fact is illustrated 
by Fig. 16: as fig > t^d the solution obviously is not always unique, the straight 
line can have two intersections with the Coulomb graph. Both pairs (Ft,i?t) are 
solutions of the dynamic equations and the contact laws simultaneously. Which 
of the solutions is realized, depends on the history. This, of course, reminds us 
of the hysteretic behaviour in stick-and-slip motion . 

Radjai, Brendel and Roux [57] showed that indeterminacy of forces is a 
generic feature of the contact laws: even if one assumes fig = f^d the number 
of possible static solutions grows linearly with the number of particles in a dense 
two-dimensional system. In the one-dimensional example (Fig. 11) with fig = fid 
this does not happen. Then the solution can be found by an iterative procedure 
described in the next section. In the case of indeterminacy this does not suffice 
and additional specifications, described in [57] have to be made. 

4.3 Iterative Determination of Forces and Accelerations 

In order to determine the accelerations and the forces for the example system 
(Fig. 11) one starts with an arbitrary guess about the status of all contacts. For 
example one can assume that all contacts are closed and sliding (with randomly 
chosen initial rotation velocities). Then Vn = 0 and Ft = — fidFnSign{vt) are given 
for all contacts. F„ and Vt are then calculated from the equations of motion. If 
all pairs (Fn,Vn) and (Ft,ut) are allowed by the Signorini- and Coulomb-graphs, 
respectively, one would have found the solution. If not, one has to correct the 
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initial guess about the status of the contacts. For example, if one of the turns 
out to be negative, one drops the assumption that this contact was closed and 
lets it be opening, i.e. one sets = 0 instead of Un = 0- After a few iterations 
a valid solution will be found. Then one knows all tangential accelerations and 
hence can calculate the time at which one of the tangential relative velocities 
drops to zero. Then this contact becomes sticking, but in the course of further 
iterations it may be activated again. In this way the time evolution of the system 
can be calculated. 

4.4 Results 

The simple example system (Fig. 11) shows a surprisingly rich behaviour. No 
matter what the initial rotation of the cylinders was, sliding and sticking con- 
tacts organize themselves such that a unique state with constant acceleration is 
reached. This state can consist of up to three spatial domains. The first, next to 
the pushing block, has all contacts between cylinders sticking and all contacts 
between cylinders and the plane sliding. This means that the cylinders are either 
not rotating or counterrotating. In the third domain, at the end of the array, all 
cylinders roll, i.e. their contacts with the plane are sticking, whereas the contacts 
among them are sliding. In both the first and the third domain the absolute val- 
ues of the angular accelerations of the cylinders are constant. In between there 
is the second domain in which all contacts are sliding, the ones among the cylin- 
ders as well as the ones with the plane. The angular accelerations increase from 
cylinder to cylinder as one goes from domain 1 to 3. 

The lengths of the three domains depend on the pushing force and the friction 
coefficients for the contacts between cylinders or with the plane, respectively. The 
first domain grows with increasing pushing force (and hence increasing normal 
force between the cylinders). 

The linear acceleration x of the whole array of N cyhnders of mass m is given 
by 

-^extjl -Fext,2 Ffriction — NtTIX (40) 

where Fext,i is the pushing force and Fext ,2 is the friction force of the terminating 
block with the plane. Equation (40) defines the global friction force Ffnction* 
Since the first domain, where the cylinders slide over the plane, grows with 
increasing pushing force, Ffriction increases, too, see Fig.l7. This means that 
on the macroscopic scale the Coulomb law, stating that the friction force only 
depends on the normal force Nmg^ is no longer valid in granular materials. 



5 The Bottom-to-Top Restructuring Model 

5.1 The Algorithm and its Justification (with E. Jobs) 

The bottom-to-top restructuring model (short BTR Model) is an algorithm de- 
signed for high-speed simulation of granular systems which periodically return 
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Fig. 17. The global friction force as defined in (40) grows with increasing pushing force 
Fext,i (Fext ,2 = 0.01), as fewer and fewer cylinders are rolling 



to a static packing in a container of any shape and exposed to an external, di- 
rected force field (such as gravity). It was first applied by Jullien, Meakin and 
Pavlovitch [27] to study size segregation under vertical shaking. Figures 5 and 6 
were obtained with this model. 

Although it is possible to simulate a whole class of granular systems with 
this algorithm [58], the most intuitive example is a vertically vibrating box filled 
with particles. Imagine that the time At between two shakes is long enough that 
all particles settle in a static packing. The idea of the BTR model is to formulate 
transition rules from one static configuration to the next. 

For the vertically shaken box the following transition rule has been suggested 
by Jullien, Meakin and Pavlovitch [27]: all particle positions are updated one by 
one in the order of ascending height. Starting with the lowest particle each 
particle seeks its nearest local minimum, ignoring all higher particles. 

The physical picture behind this is that each shake dilates the packing. The 
particles then fall, one after the other, in the direction of the gravitational field 
until they hit the surface of already settled particles or the walls of the container. 
Then, each particle follows the path of steepest descent until it reaches a local 
minimum, while higher ones are still in free flight. Once the particle has reached 
this position it will stay there until the end of the restructuring cycle. After all 
particles have found their minima they are again collectively displaced or dilated 
and a new restructuring cycle starts. 

Extreme material properties are implicitely assumed in this rule. First, the 
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coefficient of restitution has to be close to zero so that one can neglect that 
particles bounce back. Second, the dynamic and static friction coefficients have 
to be large so that particles cannot escape firom shallow minima and settled 
particles are not kicked out of their positions. In contrast to molecular dynam- 
ics calculations, the algorithm needs no force calculations but only geometrical 
considerations (to find the path of steepest descent). Thus the BTR model is 
particularly adapted to study geometrical effects. Whereas molecular dynamics 
often show a tendency towards fluidization, the BTR model by construction lets 
a configuration freeze in after each cycle. 



5.2 Simulation of a Rotating Drum 
(with T. Scheflier and G. Baumann) 



In order to understand the basic properties of the behavior in a slowly rotating 
drum, we adapted the BTR model to this situation. As above, we consider only 
the case, where after each avalanche the packing adopts a static configuration. 
This configuration is then rigidly rotated by a small angle. Then, starting with 
the lowest particle, the same restructuring rule as in Sect. 5.1 is applied to find 
the new configuration. 

However, now the justification of the algorithm has to be reconsidered. Is it 
really allowed to ignore the higher particles when one particle relaxes into the 
nearest local minimum, even if there is no significant dilation of the packing, 
in contrast to the shaking? Figure 18 shows the traces of all particles during 
one update: although in principle allowed, unphysical restructurings occur only 
extremely seldom. The reason is that one has to divide the filling into a bulk and 
a surface region. In order to find the new position of a bulk particle all higher 
particles may be ignored, because they move essentially on parallel trajectories 
and therefore cannot change the result. In the avalache zone, however, higher 
particles easily give way, and again do not significantly influence the rolling down 
of particles below. 

Using this algorithm the first convincing simulation of radial size segregation 
was made [58], which compares well with experiments [25], see Fig. 5. It was 
shown that segregation occurs for all ratios of the disc radii without changing 
its character [21], in contrast to segregation due to vertical shaking. Ergodicity 
of the trajectories of a tagged particle in the drum has been shown [59]. An 
interesting prediction of the BTR model is that a monodisperse filling generally 
has a lower (dynamic) angle of repose than a bidisperse one, where the material 
of all particles is the same. This was found in simulations of a two-dimensional 
drum [60]. For smaller drums the difference between the angles of repose for 
the two fillings decreases and vanishes eventually. An experimental check of this 
prediction should be a test of the applicability of the BTR model. Also the 
fluctuations of the surface angle have been investigated in this model. Their 
power spectrum decreases with the inverse frequency squared [21]. 
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Fig. 18. Traces of the particles in one elementary rotation step. Prom [21] 



6 Conclusion 

The last four sections give an overview over some models used to simulate gran- 
ular media. Each one has its particular strengths and limitations. All of them 
have been successfully applied to study phenomena described in the first section. 
It has been one purpose of this chapter to point out that the simulation model 
has to be chosen such that the phenomenon one wants to study is not outside 
the range of applicability of the model. 

The most versatile model uses soft-particle molecular dynamics with forces 
that depend on the overlap. It is simple and can be used over a wide range of 
densities. However, it is not the most efficient model, and it is hampered by an 
inherent trend towards fluidization. The artefacts due to the detachment effect 
and brake failure have to be considered if one wants to simulate dense systems 
or rapid flows with this model. 

In the fluidized state event driven simulations are more efficient. They avoid 
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the artefacts of the soft-particle model, however they fail due to the inelastic 
collapse, if the system develops long lasting contacts. 

By contrast, contact dynamics has its most spectacular applications just for 
systems with long-lasting contacts. It should be kept in mind that this simulation 
method can be generalized to allow for the opening and closing of contacts 
[7, 56, 55], and it seems to me that this method can be developed into a more 
versatile model than ordinary soft-particle molecular dynamics. However, the 
model uses rigid particles, i.e. elasticity is only represented by a coefficient of 
restitution, which is a concept of binary collisions. Therefore one can easily 
construct special multiple collisions, in which rigid particles behave differently 
from, for example, steel spheres [62]. The eflScient implementation of this method 
is not easy. 

Finally, the bottom-to-top-restructuring model turned out to give unexpect- 
edly good results in situations, in which the system evolves from one static con- 
figuration to the next under the influence of gravity. The transition rule can be 
justified in the limiting case of vanishing restitution coefficient and large friction. 
Among the discussed models this one is certainly computatinally most efficient. 
On the other hand, it cannot be used to study fluidized granular media. 

Some simulation techniques have not been addressed in this review such as 
cellular automata [63, 64], which are also extremely efficient, and finite element 
methods [65]. So far, cellular automata have only been applied in two dimen- 
sions in the context of granular media. The three-dimensional case is expected to 
require an embedding into a higher-dimensional space, as is known from hydro- 
dynamic cellular automata. Finite element methods do not deal with particle 
dynamics, as the methods discussed here, but treat the medium on a coarser 
scale. 
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Abstract. There are many theories about why we age; this chapter will present some 
theories that are suited for simulation without claiming that they are the truth. Perhaps 
Darwinian selection leads to an accumulation of bad mutations in old age since those 
which happened in young age and prevented procreation were weeded out much more 
strongly. This balance of mutations and selection pressure has been dealt with in many 
Monte Carlo simulations since 1993. Particularly efficient is the simulation of Thadeu 
Penna which stores the mutations in the 32 bits of one computer word, with each bit 
position corresponding to one year in life, and on-bits signaling the presence of the 
mutation, off-bits the absence. 



1 Introduction 

Except for this writer, everybody gets old. Even Brigitte Bardot (or Ronald 
Reagan, or Marxism...). Why? How can we avoid that? Is there eternal youth 
outside of the movie “Death becomes her”? I don’t know. But I know how to 
program computers. Thus I present here one of the many aging theories which 
is particularly suited for computers: evolutionary population dynamics [1]. 

This is part of a widespread field, for which Ausloos coined the acronym 
BEER: biological evolution engineering research. The aim of modeling biology 
is not to be as realistic as possible but to be as simple as is compatible with the 
aspects one wants to study. A single cell is too complicated to be put completely 
onto today’s computers, just as a glass of beer contains too many molecules to be 
simulated. Nevertheless we understand today quite well the transition between 
liquid and vapor, through simple approximations such as the van der Waals 
equation, or simple models like the Ising model, proposed by Ern(e)st Ising’s 
advisor Lenz 75 years ago at a conference close to the region of Chemnitz. A 
water molecule is much more complicated than can be described by a single-bit 
variable via occupied and empty. Nevertheless, for 20 years we have believed that 
water and all other simple fluids have the same critical exponents at the liquid 
vapour critical point as the three-dimensional Ising model. So the Ising model 
does not describe hydrogen bonding and the anomalous properties of water near 
4° (7, but it does describe the behavior near the critical point. 

So what is the analog of the Ising model for aging? I still don’t know but 
perhaps we will know ten years from now. There are major hurdles to a simple 
understanding: The boiling of water is as common place as is biological aging, 

* Software included on the accompanying diskette. 
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but experiments are clear, reproducible, and give a few numbers of interest such 
as the densities on the coexistence curve or the vapor pressure as a function of 
temperature. What is aging? For lack of better definitions we defined it as the 
reduction of survival probabilities with advancing age of the individual. Clearly, 
this is not what we see in Brigitte Bardot. Also, experiments are difficult to 
reproduce, and biologists have not yet agreed on the major cause of aging. Simi- 
larly, in other fields of biology the experiments are difficult and seldom clear-cut, 
and thus do not easily differentiate between good and bad theories. For an anal- 
ogy with physics history, imagine you should have simulated fluids before it was 
known that they consist of molecules and that heat is not a separate substance 
called phlogiston. You then might have made some nice phlogiston simulations 
which gave good results but had nothing to do with the truth. In this sense, 
biologically motivated simulations may also turn out to be based on completely 
wrong ideas, even if the results are correct. This also holds for aging theories. 

This chapter is not about evolutionary or genetic algorithms. These are names 
for computational techniques which are inspired by mother nature, like sex or 
neural networks. Such simulations can then be applied to solve the spin glass 
ground state, or the traveling salesman problem. We, instead, use methods known 
from computational physics to model biology. 



2 Concepts and Models 

The survival probability S is the number of living beings of age 1, in relation 
to the number for age t one time unit earlier. Whether our time unit is one year, 
or some smaller interval, depends on the biological application. For simplicity 
we always talk about years and have in mind fish, or human beings, and not so 
much fruit flies or nematode worms. Aging then is the decay of fitness with age, 
as is observed after the particular dangers for newborn children are overcome. 
The death rate 1 — 5 increases with age roughly exponentially, as found by 
Gompertz in the last century. Of course, there are important exceptions: two 
world wars produced huge gaps in the European population whereas our model 
neglects such phenomena. Mathematically the death rate cannot become bigger 
than one, even though this was asserted in the January 1995 issue of Scientific 
American. The Gompertz law is thus valid for not too young and not too old 
people, in stable populations like that of Taiwan at the beginning of this century. 
All other aspects of aging, except this reduction of survival rate with increasing 
age, are neglected now. 

Why does this survival rate go down? Actions of oxygen radicals may be 
responsible but are hardly suited for simulations similar to Ising models. Sex 
has been found to be dangerous to your health by some: Van Voorhies [2] found 
a significantly faster decrease of survivability in mated males compared with 
unmated males (also known as the Duran-Duran effect from the movie “Bar- 
bar ella”.) Sure it is fun if I can blame women for aging, but I doubt that a 
proposal to abolish sex has much chances for being funded or followed; even if 
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implemented, who would then cite our papers 100 years from now? Much more 
similar to traditional physics simulations is the idea that random mutations 
move our genes away from the ideal configurations and thus cause aging, just 
like random thermal motion moves atoms away from their ideal lattice sites and 
produces hquids and gases. 

If mutations are random they happen for all ages, and thus at first they 
might be thought to give a constant death rate independent of age. But this 
is not true due to Darwinian selection of the fittest: a mutation endangering 
the life of a person below the reproductive age reduces the number of offspring 
much more than a mutation affecting us only late in life when we barely get any 
children. Thus after some generations, the mutation for early life is restricted to 
a very small fraction of the population whereas the mutation for late age can 
be very widespread. In this sense, selection pressure tries to keep us close to the 
ideal genetic makeup whereas random mutations move us away from it, just like 
energy minimization tries to keep an equilibrated physics system close to the 
ground state while entropy (random thermal fluctuations) move it away. The 
balance of selection and mutation gives the equilibrium aging curve S (t) just as 
the balance of energy and entropy gives the free energy minimum. 

Thus an important ingredient of such an aging theory is an age structure 
with no reproduction in young age. In the simplest case, proposed by Partridge 
and Barton [1], we take just two age intervals: juveniles from age 0 to age 1, and 
adults from age 1 to age 2. Reproduction is possible at age 1 and age 2 only, and 
is followed by death after age 2. We then have the juvenile survival rate J and 
the adult survival rate A, and aging simply means 

J>A . 

Mutations in such models were simulated by Monte Carlo methods in [3]. 

More realistic models use many aging steps, such as 32 or 64 years, and 
then it is tempting to store the whole genetic setup in a single computer word 
consisting of 32 or 64 bits. This is the case in the Penna model [4], and mutations 
in such models have also been studied by Monte Carlo [5]. Bit position t in such 
models then indicates whether this individual at age t and thereafter will suffer 
from the bad effects of a particular mutation. 

In all cases one has to be careful to distinguish between hereditary and so- 
matic mutations. The first ones are passed on to the offspring, like hemophilia, 
the second ones are not, like skin cancer from sun bathing. One of the crucial 
questions in any model is whether or not due to the accumulation of bad heredi- 
tary mutations from one generation to the next the whole population finally dies 
out: “mutational meltdown” [6]. 



3 Techniques 

Some two-age models are reviewed in [7], and the program on the diskette thus 
gives only a one-age model from which the reader can construct a two-age pro- 
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gram. This one-age model does not explain aging, but explains how the ac- 
cumulation of hereditary bad mutations is programmed and may destroy the 
population. Initially, our (juvenile) survival rates juven are all set to unity and 
of course there are no adult survival rates in this one-age simplification. Then 
we make max iterations during which we not only calculate the average survival 
rate av but also check for each individual if it dies. This is done in loop 2: if 
a random number rand(O) between 0 and unity is greater than the survival 
rate, the individual dies; otherwise it survives and suflFers a random mutation 
which reduces the survival rate by a random amount between 0 and e. Now the 
survivors reach reproductive age, each one gets m children in loop 5, and each 
of these children inherits the (mutated) survival probability of its parent. Sex is 
avoided here to prevent any violation of pornography laws by this article; thus 
bacteria might be better suited than homo sapiens for this simple program. In 
the same way, a second age interval, adulthood, can be programmed with its 
own mutations added to those of youth. 

This simulation shows in a few seconds of computer time how for m = 2 the 
population first grows since each individual gets two children. However, after 
some time the hereditary bad mutations have accumulated to such an amount 
that the survival rate sinks below 1/2, and the population shrinks instead of 
growing until it vanishes completely: mutational meltdown. The same effect also 
occurs in two-age models and can be avoided if we also allow positive mutations, 
or assume that these mutations are somatic and not inherited, as discussed 
extensively in [3]. The aging condition J > Ais often but not always fulfilled. 

More realistic, of course, is a model with many ages. A Fortran program for 
the Penna bit string model was listed and explained by Penna and Stauffer [5]. 
Basically, each bit is set if at the age corresponding to this bit position a bad 
mutation becomes active. The individual feels all bad mutations of the present 
and all previous years. If this total number of bad mutations is larger than a fixed 
threshold or larger than the average number for the whole population (Thoms 
et al [5]), the individual dies; otherwise it lives at least a year longer. After some 
minimum age of reproduction, each individual (again asexually) can get children 
at each age, and each child differs by one bit from the genes of the parent. Thus 
the children in the same family have slightly different genes, and mutational 
meltdown can be avoided. The advantage of this model is that all genes can 
be stored in a single computer word, saving lots of storage and some computer 
time. (Bernardes [5] showed that replacing bits by small integers having more 
than two possible values does not drastically change the results.) 



4 Results 

With this bit-string model [5] one may find populations in reasonable agree- 
ment with natural age distributions. The survival rate is close to unity in youth, 
decays somewhat at middle age, and decays much more rapidly at old age. How- 
ever, exceptions exist with an additional minimum in the survival rate at young 
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age (Thoms et al). The mutations accumulate mostly in old age where their 
frequency can be ten times higher than in youth, even if initially all mutations 
were distrubuted randomly over all ages, or if we started with no mutations at 
all. In other words, families with lots of bad mutations in young age have mostly 
died out whereas those with many mutations affecting old age have survived and 
proliferate. 

This balance of selection pressure (energy) and mutation accumulation (en- 
tropy) and its relation to aging becomes particularly clear if we simulate pacific 
salmon, a fish which dies soon after reproduction (“catastrophic senescence”). If 
we assume that reproduction only happens at some specific age, then all animals 
beyond that age no longer contribute to the population growth. (Parental care, 
and grandmotherly help, is ignored in this model.) Thus mutations acting on 
ages beyond this reproductive age can accumulate without being weeded out by 
selection pressure. After some time practically all fish above reproductive age 
are killed by their inherited mutations, whereas the survival rates below that 
reproductive age are close to unity [8]. The simulation results agree well with 
analytical solutions [9]. This first-order transition from a survival rate close to 
unity to one close to zero indicates how important reproduction is in biology. 
The publish or perish philosophy is a sociological example in a similar sense 
for universities. (The first-order transition claimed by Partridge and Barton to 
explain pacific salmon turned out to be an artifact of their choice of variables.) 

The Penna-Moss theory [9] gives an exact solution for large populations for 
the special case where reproduction occurs only at one precise age i?, and when 
one bad mutation acting against the animal is enough to kill it. For simplicity 
we assume an infinite supply of food and space. Then, in the stationary state 
No babies give Ni = 31/32iVo animals of age 1, since their randomly mutated 
bit can be anywhere except in the first position, N\ animals of age 1 will give 
N 2 = 30/3 liVi = 30/32iVo animals of age 2, since there mutation is allowed 
to be anywhere in bits 3 to 32, etc. For age k < R we have thus Nk/No = 
(32 — A;)/32 and older animals die. The birth rate b thus must obey No = bNji 
or b = 32/(32 — i?) in a 32-bit model. Mutational meltdown is avoided for larger 
b since then enough children get their new mutation in the already mutated bits 
relevant for older ages k > R where these mutations do not change anything. 

If we can learn survival strategies from pacific salmon, can we also help fish 
to survive against human overfishing? Early simulations [10] used a birth rate 
fluctuating by a factor of nearly 30 to explain strong fluctuations in the fish 
populations, but such fluctuations are hardly realistic. The Penna model allows 
the simulation of age-dependent fishing, and then one can show that a slight 
increase of fishing may destroy the whole population (as happened with northern 
cod off Newfoundland in 1993, and is happening there now with Greenland 
halibut). If, on the other hand, young fish are allowed to live, then the fish 
population can survive [11]. Perhaps in the future, human effort in computers 
can balance better the human stupidity in overfishing. 

It is often asserted that mutational meltdown is relevant only for asexual 
or very small sexual populations. This opinion is too optimistic according to 
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Bernardes. Sex often helps but does not always ensure the survival of the pop- 
ulation [12]. One author [13] even claimed that there are no advantages in sex 
and that nature would be better off without men. This should be a challenge for 
male scientists to investigate in greater detail the effects of sexual reproduction 
on aging. 
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Abstract. Chemical reactions seldom obey the “well-stirred-reactor” scheme, so that 
the actual spatial distribution of reactants is of importance at all times. Recent ad- 
vances in the field are due to improved numerical techniques, which often paved the 
way for a deeper analytical treatment. We survey the knowledge on the A-fB ->0 re- 
action, paying special attention to mixing techniques (stirring transformations as well 
as Levy-processes). 

1 Introduction 

Model chemical reactions are a very specific object of study, since despite their 
simply structured “rules of the game” they display very rich temporal patterns 
of behavior. Prom the point of view of simulations, model chemical reactions 
present the important advantage that in several particular cases exact solutions 
(or at least, asymptotic behaviors) are known; this provides then an excellent test 
for the numerically obtained results. On the other hand, as will be discussed in 
the following, analytical approximations may be quite misleading, when pushed 
beyond their (often poorly known) limits of validity. In several, very impor- 
tant instances, computer simulations have played a decisive role in showing that 
well-established patterns of thought were incorrect. In many cases the plotted 
results of simulations are intuitively easy to grasp; this has helped very much in 
establishing new ways of understanding the underlying phenomena. 

In this chapter we will present several cases in which simulations were exceed- 
ingly helpful in advancing the analytical approach. We will start from simple, 
bimolecular reactions on simple lattices; this will help in keeping the basic picture 
simple, without being immediately submerged by realistic details. 

2 The Basic Kinetic Approach 

Let us start from the basic kinetic scheme [1]. Interestingly, this procedure in 
widespread use in physical chemistry is not adequate in describing many intrigu- 
ing relaxation forms, as we proceed to show. 
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General irreversible reactions indexirreversible reactions are of the type: 

n 

Ai + A 2 + . . - + An = > 0 . ( 1 ) 

i=l 

Assuming the reaction rate to be A:, the classical description in the kinetic scheme 
leads to the following system of (in general) nonlinear difiFerential equations: 

= . ( 2 ) 

i=l 

In (2) Ai{t) denotes the concentration of the ith molecular species, whose initial 
value is AiQ. The simplest case of (1) is the unimolecular reaction (n = 1) of the 
form A ^ 0, whose solution is exponential: 

A{t)=Aoe->‘* . (3) 

Bimolecular reactions for which n =2 are of the type 

= -kAimt) = « . (4) 

We set C = Bo — Ao and have as a general solution of (4): 

1 + C/A{t) _ c’j.j 
1 + cMo • 

From (5) we infer for Bq ':$> Aq that C Bq and thus C/A{t) ^ 1. Hence for 
Bo ^ Ao the decay of the minority species is quasiexponential: 

A{t) - Aoe-^o'^^ . (6) 

On the other hand, if Ao = Bo then C = 0 in (5). An expansion in small C leads 
to the decay 

' p) 

from which at longer times, t ^ {Aok)~^^ an algebraic time dependence emerges: 



We pause to note that a very similar behavior is also obtained for the A+A 
0 reaction, whose kinetic equation is: 



1 dA{t) 



-k[A{t)f 



Separation of the variables in (9) and integration lead to: 



A{t) = 



1 - 1 - 2iAokt 



( 10 ) 
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a form very akin to (7). The long-time behavior obeys here 



A(t) 



1 

2kt 



( 11 ) 



Thus, from unimolecular and from bimolecular reactions one has as long-time 
decays that are either exponential oi 1ft algebraic dependent. 



3 Numerical and Analytical Approaches 
for Reactions Under Diffusion 



The problem with the 1/t dependencies of bimolecular reactions is that the 
finding is incorrect. Starting from simulations and from the better analytical 
understanding of the problem, as initiated in [2-6] we nowadays know that the 
A+B 0 reaction obeys asymptotically 



A(t)~ 



1 



( 12 ) 



as long as d < 4. This is due to the fact that even under homogeneous initial 




Fig. 1. Segregation in the A-hB 0 reaction. The reaction takes place on a square 
lattice of 1000 x 1000 sites and with periodic boundary conditions. A full lattice is con- 
sidered as initial condition. The picture shows a snapshot after 10^ time steps; roughly 
1000 particles axe displayed. The positions of the A and B particles are indicated by 
vertical and horizontal bars, respectively 




Simulations of Chemical Reactions 105 



conditions the microscopic reaction steps by themselves create nonhomogeneities 
and enhance already-existing density fluctuations. During the reaction large re- 
gions (clusters) containing only A or only B particles appear, see Fig. 1. This 
many-particle aspect of the problem was not recognized in the classical treat- 
ments of the problem, e.g., [7-10], which focused on the derivation of the reac- 
tion rate k from binary collisions. What then is the reason behind the simple 
flnding of (11)? The basic assumption underlying the general kinetic scheme of 
(2) is the “well-stirred reactor” model, in which all spatial dependencies due 
to the positions of discrete particles are neglected. Thus the use of (2) implies 
a homogeneous spatial distribution of particles during the whole course of the 
reaction. As discussed above, such an assumption is untenable in general, since 
nonhomogeneous conditions are widespread. The diffusion can only partly wipe 
out density fluctuation effects [2-6,11-17] but only when the diffusion length is 
large compared to the mean cluster size. At low particle densities, diffusion (or 
stirring) cannot create a homogeneous background. 

What about introducing the local particle densities A(x, t) and B(x, t) into 
the analytical picture? As we proceed to show, this indeed helps reproduce (12), 
but the results are still only qualitative. The idea is to start [2,3] from the 
following coupled diffusion-reaction equations: 

A(x, t) = t) — kA(x, t) (13) 

and 

B(x, t) = DV^B(x, t) — kA{k^ t) , (14) 

where D is the diffusion coefficient, and k denotes the local bimolecular reaction 
rate. The connection to A{t) and B{t) is given by A{t) = (A(x, t))x and B{t) = 
(B(x, t))x, i.e., by the spatial average. Equations (13) and (14) are far from being 
exact, since they are restricted to first-order density functions. Consequently, the 
reaction term is only approximate, since -at least- the joint probability density 
of A-B pairs is needed for a correct description [18-20]. We focus here on the case 
Aq = Bq, which implies A{t) = B{t) at all times. The analysis of (13) and (14) 
is simplified by setting ^(x, t) = A(x, t) — B(x, t) and s(x, t) = A(x, t) + jB(x, t), 
which leads to 

q{x,t) = DV^q{x,t) (15) 

and to 

s{x,t) = £>V^s(x,i) - ^ [s^(x,t) - q‘^{x,t)] . (16) 

We point out that (15) holds exactly, irrespective of the approximation intro- 
duced in the description of the reaction term. One has now A{t) = (A(x, t))x = 
|(s(x, i))x. 

It is furthermore of interest to have the expressions corresponding to (15) 
and (16) also for discrete lattices, since this allows the generalization of the pro- 
cedure to any connected underlying system and especially to fractal lattices. The 
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discrete version of (15) and (16) is the following system of coupled differential- 
difference equations [19,20]: 



q{xj,t) [q{xii,t) - q{x.j,t)] (17) 

ieo-j 



and 

s(xj, t) = r [s(xi, t) - s(xj, t)] - ^ [s^(xj, i) - q^{Kj, f)] . (18) 

In (17) and (18) Xj are the sites, cTj denotes the set of nearest neighbors of site j, 
and r is the hopping rate between nearest-neighbor sites. Note that (17) is linear, 
and its solution can be expressed through the Green’s function G(xj,t;xo,0), 
the conditional probability to be at Xj at time t having started at xq at time 
zero. One has 

i 

where g(x, 0) denotes the initial random configuration. Prom (19) all moments 
follow readily. Thus (g(x, t))x is zero at all times; the second moment, (g^(t)}, 
obtained by averaging over the lattice sites and over all initial configu- 

rations (ic) is 



{q^t)) = {q^x,t))^=N-^J2 



^G(xj,f;xi,0)g(xi,0) 



( 20 ) 



Here N denotes the number of sites of the lattice considered. Letting qo/2 be the 
initial occupation probability for A or B particles, one has as initial distribution 

{ -hi with probability qo/2 

— 1 with probability qo/2 . (21) 

0 with probability 1 — qo 

It is now straightforward to calculate (q^{t)) using (20) and (21), since the 
G(xj,i;xf,0) are independent of the initial configuration: 

= ^^G(xj,i;Xj,0)G(xj,i;Xfc,0)(g(xi,0)g(xfc,0)).^ 

= ^'^G{xj,t-,Xi,0)G{xj,t-,Xk,0)Sik 

3, ilk 

= ■ ( 22 ) 

3,i 

To proceed, we consider the Chapman-Kolmogorov equation for 0 < f < t: 
G(xj,t;xi,0) = ^G(xj,t';Xm,0)G(xm,t';xi,0) , 



m 



(23) 
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which is obeyed by all Markov processes (irrespective of lattice structure and 
dimension, i.e., it holds also for fractals) [19]. Noticing furthermore, that the 
Green’s function is symmetrical, see [19], 

G(xj,f;xj,0) = G(xj,f;xj,0) , (24) 

one finds from (22) to (24) that 

(9^(i)> = |E<^(^i’2t;x,-,0) = goG(0,2t) , (25) 

3 

where G(0, t) is the probability of being at the origin after time t, averaged over 
all starting sites. Equation (25) relates {q^{t)) to the well-understood autocorre- 
lation function G(0, t), whose leading behavior follows asymptotically [21,22] the 
power law: G(0, t) ^ aj , where d is the spectral dimension {d for Euclidean 
lattices), and the prefactor aj is lattice dependent. We continue by discussing 
the implications of (25) for the density decay. From (19) and (21) one can view 
q as being a large sum of terms which are either ±1 or zero, weighted with the 
corresponding G factors; thus, at long times the central limit theorem holds so 
that g(x,t) approaches a Gaussian distribution [3,14]. It follows that 

q{t) = {\q{x,t)\)^ = 

Furthermore, one expects from (16) that for very large k, one has 5^(x,t) 2 ::: 
q‘^{x,t), which means physically that the particles segregate in clusters [2,3]. 
Approximating hence s{t) = (s(x,t))x through q{t) and using (25), it follows 
that 

A{t) = {A{x,t))^ = ^s{t) > ig(i) = [qoG{0,2t)/2nf^ , (27) 

which, considering the power-law description for G(0,t), leads to 

A{t) > . (28) 

The constant is Cd = for Euclidean lattices, where we introduced 

T , the hopping time. We have now to establish just how far setting q{t) 2 :: s{t) 
is justified [3,14,23]. 

To settle the question, one has to center on (16), which, averaged over all 
lattice sites, gives 

m) = I [(«"(*)) - (gHt))] ■ ( 29 ) 

Here it is of interest to see if and how fast the ratio (s^ (t)} / (q^ (t)) tends towards 
unity. Furthermore, we verify to what extent s(x,t) and q(x,t) are Gaussian 
distributed. A measure for this is how fast the ratios {q^{t)Y/‘^ / {\q{t)\) and 
(s^(t))^/^/(s(t)) reach the asymptotic value (7 t/2)^/^. 

One should note that for d > 4, (28) gives as lower bound i.e., a 

decaying form faster than the kinetically expected see (8). We are thus led 



7T 



{q\t)) 



(26) 
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to expect, as usual, the marginal dimension of the process to be d = 4. It then 
follows that the ratio {s^ (t)) / {q^ (t)) should diverge for d > 4 [22,24]. On the 
other hand, convergence of this ratio means that an upper bound to A{t) is also 
given by an expression akin (apart from the prefactor) to (28). We studied these 
points by solving (13) and (14) numerically. For the numerical treatment the 
ratio has to be fixed; for comparison to former approaches [23] we took k 

as being equal to 2/r. 

In Fig. 2 various quantities are shown for d = 1. To demonstrate the re- 
gion of long-time behavior clearly, the quantities were multiphed by their ex- 
pected asymptotic forms, such that the asymptotic patterns appear as horizon- 
tal lines. A lattice of 4 x 10^ sites was used. Plotted are (|g(t)|)t^/^/(2(7i) and 
{s{t))t^/^ l{2Ci). The displayed curves demonstrate that {|g(t)|) quickly reaches 
the asymptotic regime, whereas {s{t)) relaxes considerably more slowly. In the 
region of moderate times (|^(t)|) and {s{t)) differ significantly, and (s(t)), as 
presented in Fig. 2, shows a characteristic hump. 




Fig. 2. Results in d = 1 from Monte Carlo (MC) simulations and from the deterministic 
approach [19] (a) MC result: Deterministic results: (b) (s(t))t^^^/(2C'i), 

(c) (|g(i)|>t^/V(2Ci), (d) (e) {s{t))/{s^t)y'\ (f) {|gW|>/{g^f)>^/^ 

The initial concent rations are Ao = 0.05 and Bo = 0.05; the dash-dotted lines indicate 
the values 1 and y(7r/^ 



These patterns are compared with results taken from Monte Carlo (MC) 
calculations in which initially equal numbers of A and B particles were placed 
randomly on the lattice and typically 10^ to 10^ particles were used. Then a 
particle was picked randomly and was moved to a next neighbor position while 
simultaneously the time was incremented by the inverse of the number of par- 
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tides still present in the sample. If one particle attempted to move into a site 
occupied by a particle of the opposite species, then both particles were removed 
from the lattice. The simulation results plotted as /Ci [curve (a) 

in Fig. 2] show the same characteristic behavior as {s{t)) [curve (b)]; however, 
^Mc(^) relaxes significantly more slowly to its asymptotic value than 
We view these differences between MC and deterministic data as resulting from 
the approximations introduced in the diffusion-reaction equations (13) and (14), 
which are thus limited in their ability to describe processes as complicated as 
particle annihilation. To complete the analysis, we also display in Fig. 2 the ratio 
(s^(t))/(g^(t)) [curve (d)], which shows a slow convergence to the value. Finally, 
also plotted are the two ratios /(\q{t)\) and /{s{t)). Both ra- 

tios converge to the asymptotic value of (7 t/ 2)^/^, which is consistent with q and 
s being Gaussian distributed at long times. Again the sum variable relaxes more 
slowly than the difference variable to its limiting value. 




Fig. 3. The ratio {s^ (t)) / {q^ (t)) for hypercubic lattices in one to five dimensions [19] 



For a display of the situation for Sierpinski gaskets , see [19]. Here we close 
by showing in Fig. 3 the ratio {s{t)) / (q{t)) for Euclidean lattices in 1, 2, 3, 4, and 
5 dimensions. As is obvious from the figure, the behavior in d = 4 is marginal. 

4 Reactions in Layered Systems 

In this section and the next we display several results for reactions under stirring 
conditions, in which we highlight the interplay between simulation procedures 
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and analytical approaches [25]. Stirring is of common occurrence both in chemical 
technology and also in everyday life. Laminar mixing, as found in viscous liquids, 
often occurs in polymer and glass processing and is the general feature in the 
physics of the Earth. 

In this section we study reactions in premixed layered systems, and discuss 
the situations which evolve under mixing in the next section. In a premixed 
system the reaction is switched on after the mixing procedure has ended. As in 
Sect. 3 we focus on equal numbers of A and B particles, whose temporal evolution 
is governed by (13) and (14), restricted, because of the layered geometry, to one 
dimension. 

This situation was first considered numerically by Muzzio and Ottino [11- 
13] and then analytically; our investigations [14,15] are analogous to the method 
reported below, whereas in [26] a perturbative approach was used. 

We begin by discussing the numerical algorithms and the results of the sim- 
ulations [25]. The starting point is a one-dimensional array of striations, which 
is a particular realization of a striation thickness distribution (STD). The initial 
concentrations of the reactants in the layers are taken to be equal. This means 
that initially one has at each site x either A(x,0) = qo and B{x,0) = 0 or vice 
versa. For the stoichiometrical case considered the total number of sites occu- 
pied by the A particles is equal to that occupied by the B particles. The reaction 
is then modeled according to (13) and (14), whose solution can be obtained 
through different numerical schemes. 

Thus in [11-13] a two-substep numerical procedure was used. The method is 
applicable both for finite and for infinite k. At each time step t of the procedure 
the diffusion equations were first solved in the absence of any reaction; this leads 
to some auxiliary functions A{x,t At) and B{x,t A At). The second substep 
lets the reaction proceed in the absence of diffusion. In the case k, oo this 
leads to the assignment: 



A — A — B and jB = 0 if A> B 



(30) 



and to 



B = B — A and A = 0 if B > A . (31) 

In (30) and (31) the quantities A, B, A and B are taken at the site x of the 
discretized system and at time t 4- At. In the case k < oo the rules are more 
involved, and read: 



A = 



A-B 

I-7B/A 



(32) 



and 



B-A 

{l-AhB) 



(33) 



with 7 = Qyip[nAt{B — A))]. The main result of [11-13] is that both for finite and 
for infinite k the long-time asymptotic behavior of the averaged concentration 
A{t) = {A{x^t)) = {B{x,t)) = B{t) follows a power law 



A{t) = B{t) ~ t~^/^ , 



(34) 
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as is typical for d = 1 systems, see (12). 

The numerical work of [11-13] and [26] also gives information on the spatial 
structures which emerge from the reaction. Thus, the total number N of stri- 
ations decreases according to N{t) ~ The numerical analysis also shows 

that the STD scales as a function of the dimensionless parameter rj = 
here a is the width of a particular striation and L{t) is the mean width at time 
t. 

In our own calculation we used a one-step algorithm for the diffusion-reaction 
problem [25]. In this case the time evolution of A{x^t) and B{x,t) is a discrete 
version {x -> n) of (13) and (14). The algorithm reads [25]: 

A{n^ t + At) = A(n, t)+ 

+ At {D [A{n -bl,t) A{n — 1, t) — 2^(n, ^)] -- kA{ti, t)B{n, t)} (35) 

and 

B{n, t + At) = B{n, t)+ 

-t- At {D [B{n + 1, t) + B{n — 1, f) — 2B{n, ^)] — kA{ti^ t)B{n, i)} . (36) 

For a small enough At this algorithm is stable for small and medium values of k.. 
We discuss the results after considering the analytical approach to the problem, 
which, as before, is based on the sum and difference scheme. 

The main change from Sect. 3 is that now the system is practically onedi- 
mensional and the initial distribution of reactants follows a STD, and not the 
(uncorrelated) prescription of (21). If the initial STD has a finite correlation 
length A, the distribution of q at long-enough times will again be Gaussian. In 
a continuous picture one has namely 

CO 

q{x,t)= j q{^,G)G{x - (37) 



where G{x,t) = (47rDf)“^/^ exp(— x^/4Dt) is the (continuous) Green’s func- 
tion of the diffusion equation in ID. The characteristic width of this bell-like 
function is Ld ~ Therefore, for large enough t, when Ld > A, the inte- 

gral (37) can be viewed as being a sum of Ld/X independent terms having zero 
mean and finite dispersion. Under the conditions of the central-limit theorem, 
the distribution of this sum will then be Gaussian. In a similar way, one can 
also make plausible that g(x, t), viewed as a random function of the coordinate 
a; at a fixed time t, corresponds to a Gaussian random process [27]. Such a pro- 
cess is fully characterized by its average value {q{x,t)) and by the correlation 
function Rt{xi,X 2 ) = {q{xijt)q{x 2 ,t)). In our case we have as ensemble average 
(g(x, t)) = 0 for all t and 

Rt{xi,X 2 ) = J J G{xi - ^,t)(q{^, 0 )q{T], 0 ))G{x 2 - T],t)d^dr] . (38) 
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In the case when the two first moments L = {x) and S = (x^) of the STD exist 
and one also has S ^ the long-time (Gaussian) regime of (38) gives: 

where the constant JT equals F = q^S — £^)/L and we set Rt(x) = i?t(x,0). 
Furthermore, the concentration decay law in the long-time regime is: 

A{t) = i(|g|) = 11 klP(9)dg = (40) 

as can be seen by inserting into (37) the explicit form of the q distribution, 
P{q) = (27r)~^/^o-~^ exp[— (g — /x)^/cr^] with jjl = (q) = 0 and cr^ = -Rt(O). One 
may note the emergence of the expected behavior in (40). 

Of interest for finite reaction rates k is the behavior of (^(t)), see (29). For 
this we put (s(t)) = (s^(t))^/^ = s (see [16] for a discussion) and use the explicit 
expression for (g^) obtained with the help of P(g). In [16] we found that for 
intermediate times s{t) ^ (classical behavior), while at long times s{t) ~ 

Figure 4 shows the results of the numerical modeling of a reacting system 
with a finite ac. For comparison the analytical long-time behavior of (|g(x,t)|) is 
also given. One sees that at longer times the curves for s{t) and (|g(^)|) merge 
and that they follow the analytical asymptotic form. Therefore the long-time 
kinetic behavior is determined by the fluctuations of the striation distribution. 

The analytical approach used here also makes possible the investigation of 
the evolution of the clusters during the reaction, see [15]. Both the analytical 
and the numerical analysis show that for nonvanishing k the system may be 
viewed as consisting of well-defined striations of A and B particles and that the 
reaction takes place within thin regions close to the clusters’ boundaries. This 
confirms our picture that at longer times s(x, t) tends to [g(x, t)|, or equivalently 
that the equations 



A{x,t) = q{x,t)0 {q{x,t)) 


(41) 


and 




B{x,t) = -q{x,t)0 {-q{x,t)) 


(42) 


hold asymptotically. 





Here we view the clusters as being intervals between two subsequent simple 
roots g(x,t) = 0 of the difference variable g; hence the distribution of such 
clusters is equivalent to the zero-level crossing (ZLC) problem in the theory 
of random processes, see for example [27]. The solution of the ZLC problem 
gives the joint probability density P(xi,X 2 ) of finding a ZLC at the point xi, 
provided there is a ZLC at the point X 2 . The calculation proceeds by using 
the joint probability distribution P(gi,g 2 ,ui,U 2 ) of the function g and of its 
derivative u at the two points xi and X 2 * According to Rice’s formula [28] for 

the density of ZLC n = [— P"(0)/Pt(0)]^/^/7r one obtains n = If ^27r\/Pt^ 
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Fig. 4. Numerical solutions for s{t) {full line) and q{t) {dashed line) and the analyt- 
ical asymptotic form {dotted line) for a system with a STD of width W = 10, and 
Da = 1/4 = = 0.05, and A{Q) = 1.0 [25] 



and therefore L{t) = n~^ ~ t^/^.The STD p{x,t) at long times can also be 
expressed through P{xi,X 2 ), see [15]. Prom this expression the scaling form 
p{x,t) = {SDt) ~^^‘^C{xfV SDt) follows, the function C(C) being universal. This 
scaling form is confirmed by direct computer modeling of the time evolution 
of the difference variable, see Fig. 5. Furthermore it follows that the function 
g{cr,t) = 'KMrfQ /\/2 (the occurrence frequency in a system of size M 

of lamellae of width a, multiplied by a‘^) scales with the dimensionless variable 
7] = a/L{t); it is precisely this fact which was discovered numerically in [11-13]. 

5 Reactions Under Mixing 

Now we proceed by incorporating stirring aspects into the diffusion-reaction 
scheme. For this we focus on two basic models for mixing: on the one hand the 
Baker’s transformation, which mixes strongly, and, on the other hand, shear- 
flow mixing, which is less effective. These rather simple-looking procedures are, 
however, related to industrially used mixing devices; for an overview one may 
consider [29] and [30]. 

Baker’s transformation is one of the simplest theoretical models for mixing, 
see [31]. Each step of Baker’s transformation (which requires, say, a time r for 
completion) consists of three substeps: (a) squeezing the square to half its initial 
width and double height, (b) cutting the obtained object into two parts and 
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Fig. 5. The scaling function C(^) [25] obtained analytically {dashed line) and numeri- 
cally (triangles) 



(c) pasting the upper part to the right of the lower one. The second mixing 
model (shear flow) can be visualized as consisting of: (a) shearing a square into 
a rhombus with an acute angle of 7t/4, (b) cutting the rhombus into two equal 
rectangular triangles and (c) pasting the right triangle to the left of the left one. 

The analytical procedure for mixing can be further simplifled by using appro- 
priate boundary conditions, which correspond to attaching to the initial system 
copies of itself along the x direction. In fact one may even go a step further 
and use statistical copies of the system, see [32,33]. Under such conditions one 
can dispense with the discontinuous step: the infinite domain now undergoes 
continuous squeezing for Baker’s transformation and continuous shearing for 
shear-flow. In a liquid in motion both cases can be described by introducing 
position-dependent velocity fields. 

For V we find the structure V{ = Y^jOtijTj, where ^ and j denote the coor- 
dinates and the aij are position and time independent. Moreover in both cases 
an = 0 is. obeyed, i.e. the liquid is incompressible, V • u = 0. 

In two dimensions one finds for Baker’s transformation that the matrix aij 
is diagonal with axx = For shear-flow only one element axy is nonzero. In 

three-dimensions the Baker’s transformation has again a diagonal matrix with 
Oixx = —a. and ayy = azz = a/2 (a > 0). For shear-flow the matrix aij again 
has only one nonzero element, namely axy 

As an extension of the former formalism a diffusion-controlled reaction in a 
moving, incompressible liquid is described by the following pair of differential 
equations, [11-13,16,17,32,33]: 



A-\-V’ VA = DAA — kAB , 
B Av • VB = DAB — kAB . 



(43) 

(44) 
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Here A and B depend both on position and on time: (43) and (44) extend (13) 
and (14) through the inclusion of velocity-dependent (drift) terms. As before, 
we revert now to the difference q and the sum s of the local concentrations and 
obtain: 

q -h V • Vq = DAq (45) 



and 

s ' Vs = DAqs — • (46) 

Thus we can proceed by paralleling the treatment of (17) and (18) above, the 
only difference being now that we need to have the Green’s functions for diffusion 
under drift. Omitting the details (see [17] for the derivation) one finds for the 
Baker’s transformation 



G(r,t;ro,0) = (27r£>)-‘*/2 



nVi 



Oii: 



exp(-2aiit) 
whereas shear-flow leads in 2D to [17] 

>/3 



exp 



p OLii {rj - rgie°‘''^) 

2D [exp(2atit) - 1] 



G(r,t; ro,0) 



2tt D ty/ aH"^ -I- 12 



X exp 



3[a;-2:o- f(y + 2 /o)] {y-yo?' 



-h 12) 

and in 3D to a more complex form [17] 

3 



4Dt 



G(r,t;ro,0) = ( 



167r^T>^t^(a^t^ + 12)^ 



X exp I — 



Z[x - Xq - + yo)] ^ {y - yoY {z - zqY 



Dt{aH^ + 12) 



ADt 



ADt 



(47) 



(48) 



(49) 



These forms can be used to analytically establish the decay forms, which can 
then be compared to simulation results. Analytically one finds for Baker’s trans- 
formation at longer times. 



■g^(t) ~ exp(-|a3;x|^) , (50) 

whereas for shear flow one obtains at longer times algebraic decays 

q^it) ~ , (51) 

where ^ = 2 in 2D and /? = 5/2 in 3D, see [17] for details. Taking also the quan- 
tity s‘^{t) into account allows one to show that in general the important initial 
stages of the decay (these are experimentally of main interest) are controlled by 
stirring. However, the duration of these stages varies, depending on the type of 
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mixing considered and on the dimension of space in which the reactants move. 
On the other hand, one often recovers asymptotically, at long times, the classi- 
cal kinetic behavior. Our calculations also allow us to conclude that the classical 
kinetic scheme is obeyed only in the limiting case of very effective mixing and 
very diluted solutions. 

6 Reactions Controlled by Enhanced Diffusion 

Much attention has been recently drawn to systems which display enhanced dif- 
fusion, where the mean-square displacement of a particle grows super-linearly in 
time [34-41]. Such enhancement has been experimentally observed, for instance, 
in a two-dimensional flow in a rotating annulus [42] and in self-diffusion studies 
in polymer-like breakable micelles [43]. In these cases, as well as in a broad range 
of numerical studies of dynamical systems , the enhancement has been attributed 
to Levy walks which generalize the simple Brownian motion by extending the 
central-limit theorem [38,39,44,45]. 

We introduce Levy statistics into reaction dynamics, a step which enables 
us to generalize the above-investigated reaction-diffusion schemes by including 
motional enhancement, and to demonstrate the continuous approach towards the 
mean-field results [40]. In this sense the . Levy- walk enhanced reactions present 
another model of simple mixing processes and broaden the scope of applicability 
of the above-mentioned reactions. We show that imposing the Levy- walk aspect 
accelerates the reaction process, leads to different reaction patterns and lowers 
the critical dimension at which the mean-field behavior sets in. 

As above we concentrate here on the transient A-hB 0 reaction with ini- 
tially randomly placed A and B particles and with Aq = Bq- The particles are 
considered to move at a constant velocity for time periods chosen randomly, 
according to a probability density The density 'ip{t) is assumed to follow 
a power-law, 'ip{t) ~ Here we restrict the range of the power-law expo- 

nents to 1 < 7 < 2. In this 7-regime the diffusion is enhanced, leading to a 
mean-squared displacement of a single particle that grows as (r^(t)) ~ 

We follow the analysis of the previous sections and study the time evolution 
of the particle densities in terms of the density-difference function, g(x, t) = 
A(x, t) — B(x,t); for its time evolution we write [40], 

5(x, t) = Lg(x, t) , (52) 

which is an extension of (15) to the case of enhanced diffusion. L is the operator 
which constitutes the Levy process and which is defined in Fourier space (x -> k) 
as T{Lf{x)} = — clk|'^J'^{/(x)}. The regular diffusion limit is recovered when 

7 = 2; in this case L is the Laplacian and c is the diffusion coefficient. 

As in the previous sections we estimate the density of A and B particles from 
A{t) = B{t) ^ (|g(x, t)|). The latter quantity is obtained from the moment- 
generating function (exp[<^g(x=0, t)]). At long times one has [3,46,48] 

(exp[0g(x=O,f)]) = exp[^0^/(<)] , 



(53) 
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Fig. 6. The time evolution of the density A{t) in the one-dimensional A-hB 0 reaction 
for nearest-neighbor random walks (NN-RW) and for enhanced diffusion, where the 7 
values are as indicated. Simulation results are given by full lines; the dashed lines are 
the predictions according to (54) 

where the average is taken over all possible realizations of the initial conditions. 
This expression can be shown to hold equally well for regular and enhanced diffu- 
sion. It demonstrates that g(x=0, t) is Gaussian distributed; therefore asymptot- 
ically (|g(x=0,t)|) = (2(g^(x=0,t))/7r)^/^ also for enhanced diffusion. I{t) was 
shown to be related to the Green’s function [3,48], I{t) = 2AoG(x=0, 2t), where 
Aq is the initial concentration of the A particles and G(x, t) denotes the Green’s 
function for enhanced diffusion. For Levy walks one has G(x=0,t) ~ in d 
dimensions; we thus obtain 

A{t) ~ , (d/27) < 1 , (54) 

which for 7 = 2 reduces to the regular result of (12). 

Segregation into A-rich and B-rich areas also occurs in reactions controlled 
by enhanced diffusion. The segregation is considered to take place on a scale 
A{t). The temporal behavior of A{t) can be discussed in terms of the position- 
position correlation function (g(x, t)q{x',t)), that was shown to be related to the 
Green’s function by [49] 

(g(x,i)g(x',t)) ~G(x-x', 2t) . (55) 

The behavior of the correlation length is thus equal to that of the characteristic 
length of the Green’s function. From the scaling properties, x ^ of the 
Green’s function, we find 



( 56 ) 
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Fig. 7. The segregation length A{t) as a function of time for the same diffusion con- 
trolled reactions considered for Fig. 6. Full lines give the simulation results and the 
dashed lines indicate the slopes according to A{t) ~ 



In the simulations particles are initially dispersed randomly on the lattice 
and each particle is outfitted with a randomly chosen direction of motion and 
a randomly chosen duration time of constant velocity. After the duration time 
has elapsed a new direction and a new duration time are chosen at random. The 
particles are removed from the system at first encounter with an unlike species. 
Typically 10^-10^ particles are considered at the beginning of the reaction. 

In Fig. 6 we show A{t) both for regular nearest-neighbor random walks and 
also for various diffusional enhancements. The numerical results are compared 
with the predictions; a reasonable agreement is observed. From the above deriva- 
tions the prefactors can also be derived, these prefactors are also considered for 
the presentation of the predictions in the figure. In Fig. 7 we show the segrega- 
tion length A{t) for the same set of parameters as in Fig. 6. Again the numerical 
results follow reasonably well the predicted slopes. 

From this analysis we conclude that the enhanced diflfusion, when introduced 
into diffusion-controlled reactions, manifests itself in a number of ways. As ex- 
pected, the reaction is accelerated and the size of the clusters grows faster. 
Moreover, whereas in the simple diffusion case the critical dimension at which 
the classical rate-equation approach is applicable is dc = 4: [2-6], here, due to 
the enhancement, the critical dimension is reduced to dc = 2'j. Correspondingly, 
segregation, which slows down the reaction, is expected to also disappear for 
dimensions 2j < d < 4. 
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Abstract. In this paper we give a brief introduction into the fractal concept and dis- 
cuss the way the laws of diffusion (mean square displacement as a function of time and 
spatial decay of the probability density) are modified on random fractal structures. We 
describe algorithms to generate random fractals and to simulate the diffusion process 
on these structures. We show how the theoretical predictions can be tested by computer 
simulations. 



1 Introduction 

The fractal concept is an important tool for characterizing irregular structures 
in nature that are self-similar on certain length scales [1, 2, 3, 4]. In this pa- 
per we study both analytically and numerically how the laws of diffusion are 
changed in these structures. Accordingly, the paper is divided into three parts. 
In the first part (Sects. 2-4), we discuss certain deterministic and random fractal 
structures that are widely used to mimic irregular structures in nature.. In the 
second part (Sects. 5 and 6), we consider random walks on fractal structures and 
discuss how the laws of diffusion are changed compared with regular structures. 
In the third part (Sects. 7 and 8) we finally present the numerical methods for 
generating random fractals and simulating random walks on them and describe 
the programs used in the workshop. 

Before starting with irregular structures, we would like to remind the reader 
of the concept of dimension in regular systems. It is well known that in regular 
systems (with uniform density) such as long wires, large thin plates, or large 
filled cubes, the dimension d characterizes how the mass M{L) changes with the 
linear size L of the system. If we consider a smaller part of the system of linear 
size bL {b < 1), then M{bL) is decreased by a factor of 6^, i.e., 

M{bL) = b^M{L) . (1) 

The solution of the functional equation (1) is simply M{L) = AL^. For the long 
wire the mass changes linearly with 6, i.e., d = 1. For the thin plates we obtain 
d = 2, and for the cubes d = 3. 

* Software included on the accompanying diskette. 
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Next we consider fractal objects. Here we distinguish between deterministic 
and random jhactals. Deterministic fractals are generated iteratively in a deter- 
ministic way, whereas random fractals are generated using a stochastic process. 
Although fractal structures in nature are random, it is instructive to start with 
deterministic fractals where the fractal concept can be most easily introduced. 

2 Deterministic Fractals 

2.1 The Koch Curve 

One of the most common deterministic fractals is the Koch curve [1, 2, 3]. Fig. 1 
shows the first n = 3 iterations of this fractal curve. By each iteration the length 
of the curve is increased by a factor of 4/3. The mathematical fractal is defined 
in the limit of infinite iterations, n -> oo, where the total length of the curve 
approaches infinity. 

The dimension of the curve can be obtained just as for regular objects. From 
Fig. 1 we notice that, if we decrease the linear size by a factor of b = 1/3, the 
total length (mass) of the curve is decreased by a factor of 1/4, i.e., 

M(iL^=lM(L). (2) 

This feature is very different from regular curves, where the length of the 
object decreases proportional to the linear scale. In order to satisfy (1) and 
(2) we are led to introduce a noninteger dimension d, satisfying 1/4 = (1/3)^, 
i.e., d = In 4/ In 3. This noninteger dimension is smaller than the dimension of 
the embedding space, here d = 2, and is called the fractal dimension. In order 
to distinguish it from the space dimension d, we denote it by df. Structures 
described by a fractal dimension are called fractals. Thus, to include fractal 
structures, (1) is generalized by 

M{bL) = b^^ M{L) , (3) 

which is solved by M{L) = . 

When generating the Koch curve and calculating df, we observe the striking 
property of fractals - the property of self-similarity - which is the basic feature of 
all deterministic and random fractals. If we take a part of a fractal and magnify 
it by the same magnification factor in all directions, the magnified picture cannot 
be distinguished from the original. 

For the Koch curve as well as for all deterministic fractals generated itera- 
tively, (3) is of course valid only for length scales L below the total linear size Lq 
of the curve. If the number of iterations n is finite, then (3) is valid only above 
a lower cutoff length ao, ao = Lq/3'^ for the Koch curve. Hence, for a finite 
number of iterations there exist two cutoff length scales in the system, an upper 
cutoff Lq representing the total linear size of the fractal, and a lower cutoff oq. 
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n=0 




n = I 






Fig. 1. The first iterations of the Koch curve. The fractal dimension of the Koch curve 
is di = In 4/ In 3 

This feature of having two characteristic length scales is shared by all fractals 
in nature. 

The Koch curve can be viewed as a mathematical model for coastlines. Simi- 
lar to realistic coastlines such as the coast of Norway or the coast of Great 
Britain, the length of the Koch curve increases continuously when the length 
scale is decreased. Using as length scale a stick of length i = (1/3)"^ (m = 

0. 1. 2. . . .), one obtains for the length of the curve 

Lc = (4/3)™ = [(1/3)>«]1-In4/In3 ^ ^ 

1. e., the fractal dimension determines the way Lq tends to infinity for (. approach- 
ing zero. Using good maps of Norway or Great Britain, it is not difficult to verify 
in this way that the fractal dimensions of the coastlines are df = 1.5 for Norway 
and df = 1.3 for Great Britain [1, 2]. 
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« =2 n=A 



Fig. 2. The Sierpinski gasket. The fractal dimension of the Sierpinski gasket is 
df = In 3/ In 2 



2.2 The Sierpinski Gasket 

The Sierpinski gasket [1, 2, 3] is generated by dividing a full triangle into four 
smaller triangles and removing the central triangle (see Fig. 2). In the following 
iterations, this procedure is repeated by dividing each of the remaining triangles 
into four smaller triangles and removing the central ones. 

To obtain the fractal dimension, we consider the mass of the gasket within a 
linear size L and compare it with the mass within L/2. Since M{L/2) = M{L)/3, 
we have df = In 3/ In 2 = 1.585. 

Next we consider random fractal structures, which are more important since 
they occur in nature. 



3 Random Fractals 

3.1 The Random- Walk Trail 

Imagine a random walker on a square lattice or a simple cubic lattice. In one 
unit of time the random walker advances one step of length a to a randomly 
chosen nearest neighbor site. Let us assume that the walker is unwinding a wire 
that he connects to each site along his way. The length (mass) M of the wire 
that connects the random walker with his starting point is proportional to the 
number of steps t performed by the walker (Fig. 3). After t steps, the actual 
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Fig. 3. A random walk in a square lattice. The lattice constant a = 1 is equal to the 
jump length of the random walker 



position of the walker is described by the vector 

t 

r(t) , (4) 

r=l 

where Cr denotes the unit vector pointing in the direction of the jump at the 
rth step and a is the lattice constant. 

The mean distance the random walker has traveled after t steps is described 
by the root mean square displacement R{t) = , where the average (• • •) 

is over all random walk configurations on the lattice. Prom (4) we obtain 

t 

R^{t) = (r^{t)) = {^T ■ er’) = a^t + {er ■ er'} ■ (5) 

r,r'=l t:^t' 

Since jumps at different steps r and r' are uncorrelated, we have (e^ -0^0 = 
and therefore 

R(t) = . (6) 

R{t) characterizes the spatial extension of the curve generated by the random 
walker which we call the random- walk (RW) trail. According to (6), the length 
of the trail (which is proportional to the mass of the wire) increases as R{t)^, 
and therefore the RW trail has the fractal dimension df = 2. Since R^{t) ~ t 
holds for all dimensions d, the fractal dimension of the RW trail does not depend 
on the embedding space dimension. 

3.2 Self- A voiding Walks 

Self-avoiding walks (SAWs) are defined as the subset of all nonintersecting ran- 
dom walk configurations. As was found by Flory in 1944 [5], the end-to-end 
distance of SAWs scales with the number of steps t as 

R{t) - r , (7) 

with 1 / = 3/(d + 2) for d < 4 and u = 1/2 for d > 4. Since t is proportional to 
the mass of the RW trail, it follows from (7) that df = 1/u. Self-avoiding walks 
serve as a model for polymers in solution, see [6] and Chap. 6 in [3]. 
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3.3 Percolation 

Consider a square lattice, where each site is occupied randomly with probabil- 
ity p or empty with probability 1 — p. For large lattices p is identical to the 
concentration of occupied sites. At low concentration p, the occupied sites are 
either isolated or form small clusters (Fig. 4a). Two occupied sites belong to 
the same cluster if they are connected by a path of nearest-neighbor occupied 
sites. When p is increased, the average size of the clusters increases. At a critical 
concentration pc (also called the percolation threshold) a large cluster appears 
which connects opposite edges of the lattice (Fig. 4b). This cluster is called the 
infinite cluster, since its size diverges when the size of the lattice is increased to 
infinity. When p is increased further, the density of the infinite cluster increases, 
since more and more sites become part of the infinite cluster, and the average 
size of the finite clusters decreases (Fig. 4c) . 




Fig. 4 a— c. Square lattice of size 20 x 20. Sites have been randomly occupied with 
probabilities p \p = 0.20 (a), 0.59 (b), 0.80 (c)]. Sites belonging to finite clusters are 
marked by full circles, whereas sites on the infinite cluster are marked by open circles 



The percolation transition is characterized by the geometrical properties of 
the clusters near pc* The probability Pqo that a site belongs to the infinite cluster 
is zero below pc and increases above Pc as 

•Poo ~ (p - Pc)^ ■ (8) 

The linear size of the finite clusters, below and above pc, is characterized by 
the correlation length The correlation length is defined as the mean distance 
between two sites on the same finite cluster and represents the characteristic 
length scale in percolation. When p approaches Pc, ^ increases as 

c~ip-pcr'' , (9) 

with the same exponent u below and above the threshold. While pc depends 
explicitly on the type of lattice (e.g., pc = 0.59277 for the square lattice and 
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1/2 for the triangular lattice), the critical exponents (5 and v are universal and 
depend only on the dimension d of the lattice, but not on the type of the lattice. 

Near pc, on length scales smaller than both the infinite cluster and the 
finite clusters are self-similar. Above pc, on length scales larger than the infinite 
cluster can be regarded as an homogeneous system which is composed of many 
unit cells of size Mathematically, this can be summarized as 

{ rdf ^ r ^ 

( 10 ) 

r > ^ 

The fractal dimension df can be related to (3 and v: 

d{ = d~- . (11) 

u 

Since /? and i/ are universal exponents, df is also universal. One obtains 
df = 91/48 in d = 2 and df = 2.5 in d = 3 [4, 7]. 

The percolation model has found numerous applications in physics, chem- 
istry, and biology, where occupied and empty sites may represent very different 
physical, chemical, or biological properties. Examples are the physics of two com- 
ponent systems (the random resistor, magnetic or superconducting networks), 
the polymerization process in chemistry, and the spreading of epidemics and 
forest fires, see [4, 7] and [8]. 

4 The ‘‘Chemical Distance” t 

The fractal dimension, however, is not sufficient to fully characterize a fractal. 
An important fractal substructure is the shortest path on the fractal between 
two distant fractal points (Fig. 5). The length £ of this path, also called the 
“chemical distance” , increases, on average, with the spatial distance r between 
both points as 

£(r) ~ . (12) 

The average mass of a cluster within a chemical distance I increases with £ as 

M(£)-£^^ , (13) 

which defines the “chemical dimension” df. Since M scales with r as M(r) ~ r^^ 
it follows that df = df/dmin- For linear fractal structures like coastlines or the 
RW trail one has df = 1 and thus dmin = df. For percolation clusters one has 
dmin = 1.13 (d = 2) and d^in = 1-37 (d = 3) [4]. 

More information on the connectivity of a fractal is obtained from the prob- 
ability ^(£|r) of finding a site with a chemical distance £ at fixed distance r from 
a cluster site, and the related probability #(r|£) of finding a site with a spatial 
distance r at fixed chemical distance £ from a cluster site. Numerically, #(r|£) 
can be obtained as follows. First one chooses one cluster site as a center site and 
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Fig. 5. The shortest path between two points A and B of the unfinite percolation 
cluster generated at the percolation threshold of the square lattice site problem. The 
chemical distance is the length of the shortest distance 

counts the number N (£) of all sites that are at a given chemical distance £ from 
this site. Among these N(£) sites, there are N(r, £) sites at a euclidean distance r 
from the center. The fraction N(r,£)/N(£), averaged over many configurations, 
can be identified with ^{r\£). The related probability ${£\r) is obtained in a simi- 
lar way by calculating the number of sites N{r) that are at euclidean distance r 
from the center, and averaging over the fraction AT(r, £) /N{r). Since by definition 
= N{£) and = N{r), the probability densities satisfy, in 

the continuum limit, the normalization condition J ^{r\£) dr = / ^{£\r) d£ = 1. 
Since N{£) and N{r) scale as 

N{£) ~ £^^-^ and N{r) - , (14) 

both probability densities are related by 

mr) ~ ^(^- 1 ^) ^ • ( 15 ) 

For RW trails, ^{£\r) can be determined analytically. By construction, the 
lenght £ of the trail is identical to the number of steps t performed by the 
random walker. Hence, for the RW trail, ^{r\£) has the same form as the well- 
known probability P(r, t) for finding the random walker at time step t a distance 
r from his starting point, i.e. 

^{r\£) ^ £~^^^ exp 
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Thus we obtain with (15) 



^{i\r) ^ ^ i exp 



d 

'~2f 



It is convenient to rewrite (17a) as 

C, 



= f (^) 



(17a) 



(17b) 



with dmin = 2,5 = dmin/(c?mm-l )5 9 = d~2 and C 2 = d/2. Equation (17b) holds 
also for percolation clusters with p = 1.35 (d = 2) and p = 1.5 (d = 3) [9, 10]. For 
fixed r, #(£|r) has a maximum at ^max — By definition we have ^{i\r) = 0 

below some cutoff length imini'f')-, which is the shortest chemical distance i a 
site at the distance r from a central site can have when N configurations are 
considered. In contrast to £max, ^min depends strongly on N [11]. This is shown 
explicitly in Fig. 6a, where for RW trails on the sc lattice ^min is plotted as a 
function of r for N ranging from 1 to 10^. The figure shows that 



^min{r,N) = < 



r, 






r < rc(iV) , 
r > rc{N) , 



(18) 



with dmin = 2 for the RW trail. 

To determine the crossover value r^N) analytically, we note that in order to 
find one configuration with i = r = rc we have to generate about N = = 

configurations, with the coordination number 2 ; = 6 for the sc lattice. This 

yields 

,,,, ^ InW 

Tc(iV) = 1 + . (19) 

m z 

To determine amin{N) we assume scaling, 

^min(r, N) = n{N) 9 {r/r,{N)) . (20) 

In order to satisfy (18), we must require g{x) = a: for x < 1 and g{x) = 
for X ^ 1. This yields 

/ 1 1\T\ ^ ^min 

a^,n{N) = [re(iV)]^-^-" = (1 + ^) . (21) 

Figure 6b shows iminlrdN) versus r/rdN). The data collapse strongly supports 
the scaling assumption (20). Equations (18-21) also hold for percolation clusters 
at Pc when rdN) is substituted by rdN) = {\nz + IniV)/ ln(l/pc) [H]- As 
discussed above we have dmin — 1.13 (d = 2) and dmin — 1-37 (d = 3). 

The form of ^{i\r) and the dependence of £min on N is characteristic for 
random fractals with dmin > 1- 
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Fig. 6. (a) The minimum distance iminir^N) versus r for RW trails on the sc lattice, 
for iV = 1 (full circles), 11 (squares), 128 (full triangles) and 10^ (triangles), (b) Scale 
plot of imin(r, N)/rc(N) versus r/rc(N) for the same N values as above. For N below 
10"^, averages have been performed over typically 100 sets of N configurations 
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5 Random Walks on Fractals 

Next we discuss a random walker on a random fractal structure. 



5.1 Root Mean Square Displacement R(t) 

First we consider a random walker on the RW trail. We know that along the trail 
(in “^-space”) diffusion is one dimensional. After t time steps, the walker has 
traveled, on average, the chemical distance i{t) = a and thus has reached a 
distance R{t) ~ ~ from his starting point. Accord- 

ingly, instead of R{t) ~ we have now 

R{t) - (22) 

with dyf = 2dniin = 4 the fractal dimension of the random walk. Equation (22) 
with dw > 2 is valid for all fractal structures. In general, however, one cannot ex- 
press dw rigorously by df or dmin* For percolation clusters, one has approximately 
dw = 3df/2 [12]. 

It is clear that the anomalous diffusion law (22) can only occur on length 
scales where the considered structure is self-similar. Let us consider, for example, 
the infinite percolation cluster above pc that is fractal below the correlation 
length ^ and compact above According to (22), the random walker needs 
~ time steps to travel a distance of the order For small times t t^, 
the random walker explores the fractal regime and (22) holds. For long times, 
t on the other hand, he explores the whole compact cluster and diffusion 

is normal, R{t) t^/\ 

The exponent dw is experimentally accessible, for example, in chemical re- 
actions on fractal structures. Equation (22) implies that the number of distinct 
sites visited scales as S{t) ~ [R{t)Y^ ~ 

For annihilation reactions {A + A -> 0), the reactants K of the diffusing 
A particles decreases as dK/dt ~ Hence we expect K ~ t~^/^ on 

percolation clusters, which has been confirmed experimentally by [13]. For other 
ways of measuring dw we refer to [4] and [8]. 



5.2 The Mean Probability Density 



Let us again start with the simple RW trail. Along the trail, diffusion is one-di- 
mensional and the probability of finding a random walker after t time steps on 
a site i at chemical distance £ from its starting point is given by 



P{£,t) = P{0,t) exp 




( 23 ) 



where ^i{t) is proportional to the displacement £{t) ~ along the chain. 
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We now anticipate that the length £ of the shortest path connecting a site 
i with the origin is also the relevant physical length for diffusion in percolation 
clusters, such that the fluctuations of the probability Pi{£, t) to find the random 
walker after t time steps on a site i at chemical distance £ from the origin are 
small (in contrast to the large fluctuation of the analogous quantity in r space). 
For simplicity, we assume that Pi {£, t) only depends on £ and t for all sites i at 
fixed £, and decays as 



Pi{£,t) = P{£,t) = P(0,t) exp 




(24) 



where ^i{t) ^ t^rnin/d^ proportional to the mean chemical distance traveled 
by the random walker and v = d^/{dy^ — dmin) for £ ^ [14]. If we define 

by Pi{r,t) the probability that the random walker is, after t time steps, on a 
site i at euclidean distance r, we obtain the mean probability F(r, t) for a single 
configuration by averaging over all N{r) sites i at fixed r, 



N{r) 

^ ^ i=l 

Among the N{r) sites at distance r, N{£,r) sites are at chemical distance £ 
from the center. Therefore we can write (25) as [15] 

1 

= E N{i,r) P{i,t) . (26) 

Averaging over N ^ 1 configurations and replacing the sum in (26) by an 
integral, yields 



CXD 

(P(r,t))^= J mr-,N)P{£,t)di , 



(27) 



with #(^|r;iV) from (17). 

To evaluate the integral (27), we follow [15] and [16] and use the method of 
steepest descent. Using (17b) and (24) we obtain 

OD 

{P{r,t))j^ = P{0,t) I C(^) exp[-7,(^)] (28) 

imin{r,N) 

with t]{£) = C 2 + \£l^i{t)Y and Q{£) = {C\l£) . The 

saddle £*{r) occurs at dr]/A£\i=i* = 0: 
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with 






(30) 




1 ‘^^(dmin ~ 




and 


_ c f C'a 


\ lA 


(31) 




Sr 1 f 1 

(drain 


■i)j 



where is, up to a proportionality constant of order unity, the root mean square 
displacement R{t) of the random walker. Figure 7a shows P{£,t), ^(r\£) and the 
product of both for diffusion on the RW trail, for t = 2 x 10^ and r = 16. The 
figure shows, that for the considered r value, the integrand of (28) is peaked 
strongly at £* . 

Following the method of steepest descent, we can write approximately 

C(^*(0) exp [-»?(r(r))] 

CX) 

I exp|-i7,"(r(r)) [£-r(r)]"| , (32) 

which yields 

In {P{r,t))^ Vin (^) • (33) 

By definition, (33) holds only for £mm{'f'i < ^*(^) < ^max(^)j and this 
restriction determines the r regime ri < r < Tx (N) where (33) is valid. We find 
with g and c from ^{£\r) [11] 



n 



= ^r 



/ ^ 4" dmin 

V C2S 



) 



1/u 



(34a) 



rx(iV)=er[rc(A^)]'^“ . (34b) 

Above rx{N), the integrand in (27) is peaked sharply at i = (see 

Fig. 7b), and [11] 



ln(P(r,t))^~- 




V 










r > r-x(iV) 



(35) 



To determine the integral (27) in the short-distance regime (r < ri), we note 
that r < ri implies £max{f') < ^e- Hence for £ < the behavior of ^{£\r) differs 
strongly from the behavior of P{£,t). Whereas ^{£\r) shows a steep maximum 
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at imaxi is nearly constant (see Fig. 7c). Hence we can assume that to a 

very good approximation, the integrand of (27) can be written as [17] 






{mr) PM, 






0 , 



^ <Co 






(36) 








Fig. 7 a— c. (Continued on next page.) 
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Fig. 7. The functions P{£,t) (squares), ^(£\r) (full triangles) and P(£,t) ^(£\r) (full 
circles) for RW trails where (a) t = 2 x 10^ and r = 16 (b) t = 2 x 10^ and r = 36 
and (c) ^ = 5 X 10^ and r = 10. In (a) the integrand P(i,t) #(£|r) of (27) shows a 
steep maximum at the saddle £*(r), while in (b) it shows a steep maximum at the 
cutoff value £min(r,N), in (c) finally ^(£\r) P(£,t) can be well approximated by the 
normalized step function of (36) (full line) 



The cutoff length is determined by the normalization condition 



0 



0 



(37) 



Inserting (24) into (37) we obtain Substituting (36) 

into (27) yields 



^0 

0 



= P(0,f) 




(38) 



Since ^(£\r) d£ = 1 and ^(£\r) = (Ci/£)(r/£^^^^'^^)^ for £ ^max we 
obtain finally in the short-distance regime 



{P{r.t))j, 

P{0,t) 



A/dn 



= l-a 



9 



(39a) 
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r/R{t) 



Fig. 8. Logarithm of the normalized mean probability density of random walks 

— ln[(P(r, /P(0,t)] versus r/R{t) for (a) RW trails {t = 10^, N = 25 (full circles) 
and 250 (triangles)) and (b) site percolation clusters on the square lattice (t = 400, 
AT = 5 (full circles) and 50 (triangles)), compared with the typical probability density 

— ln[(F(r,t)typ) /P(0,t)] (circles), which corresponds to the case N = 1 [11]; R(t) is 
the root mean square displacement 



with 



Cid 



l^min 



5 [r{^ + i)] 



9/df 



a = 



(39b) 
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0 1 



Fig. 9. The normalized mean probability density (P(r,t))jy /F(0, t) of random walks 
versus for (a) RW trails on the sx. lattice (p = 1, t = 5 • 10®, = 2200, 

N = 10^) and (b) site percolation clusters on the square lattice {g = 1.35, t = 10^, 
= 40, N = 100). The full lines in the plots represent the theoretical predictions of 
(39) without any fit parameters 
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independent of N. According to (39), the decay of (P(r, is characterized by 
the substrate geometry represented by the structual exponent g. 

To test our predictions in the asymptotic regime (33-35), we have performed 
Monte Carlo simulations (with quadruple precision) of random walks on RW 
trails in the sc lattice and on site percolation clusters on the square lattice. 
Figures shows ln[(P(r,t))jy /P(0, t)] for several N values. The iV-dependent 
crossover is clearly seen. The slopes of the curves correspond to our predictions, 
u = 4/3 and vdmm = 4 for RW trails (Fig. 8 a) and u = 1.53 and vdmin = 1-86 
for percolation in d = 2 (Fig. 8 b). 

To test the prediction (39) in the short-distance regime (r < ri), we have 
performed computer simulations of random walks on RW trails in d = 3 where 
a = 1.466 and on percolation clusters in d = 2 where a = 1.3 ± 0.2. Figure 9 
shows that our predictions are in full quantitative agreement with the numerical 
results. The full lines represent our theoretical results, (39), with no fit parameter 
involved. 



6 Biased Diffusion 

Next we consider a random walker on the infinite percolation cluster under the 
influence of a bias field. The bias field is modeled by giving the random walker 
a higher probability P+ of moving along the direction of the field and a lower 
probability P_ of moving against the field, 

P± - 1 d= P , (40) 

where 0 < P < 1 is the strength of the field. The field can be either uniform in 
space (“euclidean” bias) or directed in topological space [18, 19]. In a topological 
bias (see Fig. 10) every bond between two neighbored cluster sites experiences 
a bias that drives the walker away in chemical space from a point source. 

For convenience let us start with the topological bias field. If we apply such 
a field in a uniform system then the mean distance R{t) of the walker from the 
“source” A is increased linearly in time, giving the walker a radial velocity. In 
a euclidean bias field the walker gets a velocity along the direction of the field. 
The question is how this behavior is changed in the infinite percolation cluster at 
the critical concentration pc, where the cluster is self-similar on all length scales 
[18, 19, 20, 21]. We consider a walker travelling from a site A to another site B 
on the cluster. On his way the walker is driven into the loops and dangling ends 
that emanate from the shortest path between A and B. In a topological bias 
field the walker can get “stuck” in loops, as he can get stuck in dangling ends. 
Therefore, both loops and dangling ends act as random delays on the motion of 
the walker, and the percolation cluster can be imagined as a random comb where 
the teeth in the comb act as the random delays on the motion of the walker (see 
Fig. 10). The distribution of the length of the loops and dangling ends in the 
fractal structure determines the biased diffusion. 
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Fig. 10. Illustration of a percolation cluster under the influence of a topological bias 
field and its mapping to a random comb model (after [4]) 



At the critical concentration, due to self-similarity on all length scales, the 
lengths L of the teeth are expected to follow a power law distribution, 

a>0 . (41) 



The time r spent in a tooth increases exponentially with its length L, 
r ~ [(1 + E)/{1 — E)]^ [18, 19]. Since the lengths of the teeth are distributed 
according to (41), it is easy to show that the waiting times r follow the singular 
waiting time distribution [18, 19] 

#(t) ~ [r(lnr)“]“\ (42) 

and the system can be mapped onto a linear chain (the backbone of the comb) 
where each site i is assigned to a waiting time ri according to (42). A random 
walker has to wait on average ri time steps before he can jump from site i to 
one of the neighboring sites. 

The singular waiting-time distribution changes the asymptotic laws of diffu- 
sion drastically, from the power law (22) to the logarithmic form [18, 19] 



liit 



IME)\ 



(43) 



where 



A{E) ~ In 



l + E] 



1-E 



(44) 



and i is the mean distance the walker has traveled along the backbone of the 
comb. Equation (43) is rigorous for the random comb, but is also in agreement 
with numerical data for the infinite percolation cluster at pc, with a = 1 [18, 19]. 
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Fig. 11. Plot of {x{t)) versus Int for different values of euclidean field strenghts E 
along the xy direction (after [20]) 



Accordingly, we have the paradoxical situation that on the fractal structure of 
the percolation cluster the motion of a random walker is dramatically slowed 
down by a bias field: the larger the bias field E the stronger the effect. 

A logarithmic dependence of the mean displacement (x) along the direction 
of the field was also found for the euclidean bias (see Fig. 11), 



{x) ~ 



Int 



(45) 



with A{E) from (44) for not too large values of E, E < 0.6 [20]. In a euclidean 
bias field, a random walker can also get stuck in backbends of the shortest path 
between two points, as also happens in linear fractal structures such as self- 
avoiding random walks or random walks. One can show rigorously [21] that in 
these structures (45) holds, and it is an open question how general the simple 
logarithmic behavior (45) and (44) is for diffusion in random fractals in the 
presence of an external bias field (see also [22]). 



7 Numerical Approaches 

In this section we discuss numerical methods for generating large percolation 
and for studying random walks on disordered structures. Our description of the 
algorithms follows closely [4]. 
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7.1 Generation of Percolation Clusters 

Percolation clusters can be generated either by the Leath method [23] or by 
the Hoshen-Kopelman algorithm [24], where all sites in the percolation system 
belonging to the same cluster are identified. We begin with the Leath Method. 



Leath Method. In the Leath method [23] (see also [25, 26]), single percolation 
clusters are generated in the following way (see Fig. 12). In the first step the 
origin of an empty lattice is occupied, and its nearest neighbor sites become 
either occupied with probability p or blocked with probability 1 — p. In the 
second step, the empty nearest neighbors of those sites occupied in the step 
before are occupied with probability p and blocked with probability 1 — p. In 
each step, a new chemical shell is added to the cluster. The process continues 
until no sites are available for growth or the desired number of shells has been 
generated. 




Fig. 12. The first four steps of the Leath cluster growth method 



Since in the Leath method each site is labeled by its chemical distance from 
the origin, the method is particularly useful for studying structual and transport 
quantities related to the chemical space. 



Hoshen-Kopelman Method. In the Hoshen-Kopelman algorithm [24], all 
sites in the percolation system are labeled in such a way that sites with the same 
label belong to the same cluster and different labels are assigned to different clus- 
ters. If the same label occurs at opposite sides of the system, an infinite cluster 
exists. In this way the critical concentration can be determined. By counting the 
number of clusters with s sites, we obtain the cluster distribution function. 

The algorithm is quite tricky and we use a simple example to demonstrate it 
[4]. Consider the 5x5 percolation system in Fig. 13a which we want to analyze. 

Beginning at the upper left corner and ending at the lower right corner, we 
assign cluster labels to the occupied sites. The first occupied site gets the label 1, 
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the neighboring site gets the same label because it belongs to the same cluster. 
The third site is empty and the fourth one is labeled 2. The fifth site is empty. 

In the second line the first site is connected to its neighbor at the top and 
is therefore labeled 1. The second site is empty and the third one is labeled 3. 
The fourth site is now the neighbor of two sites, one labeled 2 and the other 
labeled 3. All three sites belong to the same cluster, which was first labeled 2. 
Accordingly, we also assign the label 2 to the new site, but we have to keep 
track that clusters 2 and 3 are connected. This is achieved by defining a new 
array iVjr,(A:): Nl{S) = 2 tells us that the correct label of cluster 3 is 2. If we 
continue the labeling we end up with Fig. 13b, with Ni{S) = 2 and Nl{4) = 2. 
Sites labeled 1, 2, 5, and 6 are not connected with sites with lower labels, and 
we define Nl{1) = 1, = 2, Nl{^) = 5 and, Nl{6) = 6. 

In the second step (see Fig. 13c) we change the improper labels (where 
^l(^) < k) into the proper ones beginning with the lowest improper label (here 
k = 3) and ending with the largest improper label (here k = 4). 

The Hoshen-Kopelman algorithm is useful when investigating the distribu- 
tion of cluster sizes as well as the largest cluster in any disordered system, not 
necessarily a percolation system. The method can also be used to determine pc 
and to generate the infinite percolation cluster in percolation, but for this the 
Leath method described above is more efficient. 



7.2 Simulation of Random Walks 

Monte Carlo Method. In this method random walks on random fractals 
are simulated by the Monte Carlo technique, and the diffusion exponent dw is 
determined from the mean square displacement. First the fractal structure of 
interest and one site is chosen randomly as the origin of the random walk. Then 
one of its 2 : neighbor sites is chosen randomly and the random walker attempts 
to move to that site. If it does not belong to the fractal the move is rejected, 
otherwise the walker moves. In both cases, the time is enhanced by one unit. This 
procedure is repeated until the desired number of time steps has been performed. 
At each time step t, the square displacement of the random walker from the 
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origin is recorded. Averages are performed over many starting points and a large 
number of fractals. 



Exact Enumeration of Random Walks. In this technique [14, 27, 28, 29], 
the discretized version of the diffusion equation is solved numerically by iteration 
in time, using the fact that the probability of the random walker being at site r 
at time t + 1 is determined by the probabilities of it being at its nearest neighbor 
sites at time t, 

P(r, t + l) = P{r, i) + ^ wr,r+s [P(r + S,t)- P(r, t)] . (46) 

Here, the sum over 6 is over the z nearest neighbor sites r + ^ of r, and the 
transition probability Wr,r+6 is 1 /z when r 4* 5 is a cluster site and 0 otherwise. 

The iteration starts at time t = 0 when the random walker starts at the 
origin, i.e., P(r, 0) = 1 for r = 0 and P(r, 0) = 0 otherwise, and continues until 
the desired number of time steps is reached. The method is illustrated in Fig. 14. 




^ = 0 / = 1 t = 2 



Fig. 14. The evolution of the probability of a random walker for three successive time 
steps 



To obtain the average probability density (P(r, of Sect. 5.2 one has to 
average P(r, t) over all sites r of the clusters at fixed distance r and over N 
different cluster configurations. The mean square displacement is obtained from 

= j {P(r, t))jy dr ~ 

8 Description of the Programs 

The diskette enclosed with this book contains five FORTRAN programs dealing 
with topics discussed above. To compile the programs you need a graphics library 
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called “Xfiib” . This library supplies a set of functions to handle the X window 
system and is freely available. To find the nearest FTP server you should use 
the archie command. If you want to use a different graphics library or compile 
the programs on a PC running MS-DOS you have to replace the library calls by 
appropriate ones. All used library functions are listed in the program headers 
and can be found and replaced easily. The function XINIT opens a window on the 
screen, XBACKSTORE tells the X window server to store the interior of the window 
and to restore it automaticaly if it was hidden by another window. The function 
XALLOCCOLOR allocates a color to be used by the program, XSETFGPIXVAL sets the 
foreground color, XFILLRECT draws a filled rectangle, XFLUSH flushes the previous 
draw commands to the X window server and XCLOSEDP frees the allocated colors 
and closes the window. The functions XBACKSTORE and XFLUSH do not have any 
appropriate counterparts on a PC. 

The first program EXERCISEl.F draws a square lattice on the screen and 
marks sites with probability]?. With this program you are able to study the 
geometrical phase transition in two-dimensional percolation by increasing p from 
0 to 1. In addition one can change the system size and study its effect on the 
critical probability. 

The four remaining programs use the Leath method to grow percolation clus- 
ters. The program EXERCISE2.F generates one cluster and simultaneously dis- 
plays the result on the screen. You can expand this program to study the fractal 
dimensions df , di and dmin- To this end you have to calculate the mass M(r) and 
M{£) inside a certain euclidean distance r and topological distance £ as well as 
the relation between these two types of distances. Because percolation clusters 
are random fractals, the fractal dimensions are well defined only for the aver- 
ages over an ensemble of configurations. Therefore you have to place the main 
Leath algorithm inside a loop and average the measures. Expecting a power law 
behavior for the mass-distance relation it is advisable to plot log M versus log r 
and log£ respectively. For the two distances one can plot \og£ versus logr. The 
slopes of the resulting lines give the desired exponents. 

The third program EXERCISES. F generates a percolation cluster with the 
Leath algorithm and a random walk on the resulting cluster. The territory 
covered by the random walker is shown. The different colors indicate the number 
of time steps the random walker needed to reach the site for the first time. You 
can expand the program by numerically measuring the territory as a function of 
diffusion time and averaging over ensembles of random walks and clusters. It is 
interesting to increase the number of random walkers moving simultanously on 
one cluster and to study its effect on the territory covered by all random walkers 
as a function of time. 

The two remaining programs EXERCISE4.F and EXERCISES. F differ only 
slightly. Both generate a percolation cluster with the Leath method and calculate 
a diffusion process using the method of exact enumeration. By this method you 
can directly study the probability of a random walker to be at a certain distance 
after a number of time steps. The first program enumerates normal unbiased 
diffusion in contrast to the second one, which enumerates topologically biased 
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diffusion. The strength of the bias is determined by the parameter EPSILON. 
For EPSILON equal to 0 one gets the unbiased case, the maximum bias is for 
EPSILON equal to 1 or —1 depending on its direction. Both programs display the 
probability distribution for a random walker for fixed time. The different colors 
indicate the probability in a logarithmic scale. In addition, both programs store 
the probability distribution P(r, t) for fixed time t as a function of the distance r 
in files PR0BDIS.EX4 and PR0BDIS.EX5 respectively and the root mean square 
displacement for the set of times defined in the header of both programs in files 
RMS .EX4 and RMS .EX5 respectively. You can change the parameter EPSILON from 
0 to 1 and study its effect on the probability distribution and on the root mean 
square displacement. You can also exchange the topological bias with a euclidean 
one by simply modifing the transition rates. In addition, for both types of biased 
diffusion as well as for the normal random walk, you can modify the programs 
to study also the probability distribution in topological space {P{i, t)) and the 
topological mean square displacement 
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Abstract. We investigate the Anderson model of localization in order to describe elec- 
tronic states in disordered materials. The eigenfunctions of the Anderson Hamiltonian, 
which are obtained from direct diagonalization by means of the Lanczos algorithm, are 
analyzed with respect to their spatial distribution. It is demonstrated that the wave 
functions display multifractal behavior up to length scales which are typically beyond 
the sizes of numerically accessible systems. On the enclosed diskette, wave functions 
are provided for different strengths of disorder and the reader is guided to write small 
computer programs in order to derive the characteristic multifractal properties such as 
the generalized dimensions, the mass exponents, and the singularity spectra. Special 
emphasis is laid on programming tricks which either save computer time or increase 
the accuracy. Finally, some results of large-scale numerical investigations by such a 
multifractal analysis are presented in order to demonstrate how this statistical method 
is applied in current research on the electronic properties of disordered materials. 



1 Electronic States in Disordered Systems 

In the beginning of condensed matter physics, the investigation of electronic 
(and other) properties concentrated on perfect crystals, because the transla- 
tional invariance simplifies the calculations significantly and often allows us to 
successfully employ analytical methods. The electronic eigenstates are given by 
Bloch functions which are extended over the whole crystal. They appear homo- 
geneous if one neglects fluctuations on the length scale of the lattice constant. 
Small deviations from the lattice structure of perfect crystals may be treated by 
perturbation theory. In this way it is, for example, possible to deal with a single 
impurity. 

For a sufficiently (energetically) deep impurity exponentionally localized sta- 
tes are found. These wave functions are thus strongly localized and the electron is 
bound at the impurity. If one adds more impurities to the crystal, the exponential 
tails of the wave functions overlap, tunneling is enabled between the impurities, 
and consequently the electrons become less localized, which can be understood 
as a hybridization of energetically close impurity levels or as an orthogonality 
constraint on the wave functions of different impurities. If in the extreme case all 
lattice sites of the crystal are replaced by impurities, the resulting fluctuations 
of the wave functions may be so strong that the exponential decay is completely 
masked. 



* Software included on the accompanying diskette. 
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On the other hand, the extended wave functions of the crystal are already 
disturbed by a single shallow impurity. If there are more impurities, interference 
effects become more significant, and in this case the resulting fiuctuations of 
the wave functions may completely mask the extended character of the eigen- 
states. Nevertheless, even in the extreme case of replacing all lattice sites by 
impurities, the wave function may still be extended over the entire system. In 
three-dimensional (3D) samples this is the case for sufficiently weak disorder, 
i.e., for sufficiently shallow impurities. 

It is the purpose of this chapter to present a possible method for describing 
the spatial fluctuations of the wave functions in a quantitative way. For this 
purpose, the probability density of the electronic eigenstates will be used as 
a measure. This measure is demonstrated to show fractal behavior; Different 
moments of the measure display different fractal behaviors, this leads to the 
concept of multifractality. 

In Sect. 2 the Anderson model of localization is introduced on which the sub- 
sequent numerical investigations are based. The diagonalization of the Hamil- 
tonian matrix is discussed and examples of eigenstates are presented. In the 
following sections the multifractal analysis of the wave functions is presented. 
Finally it is shown, how the multifractal analysis can be used to determine the 
metal-insulator transition which separates extended and localized wave func- 
tions and thus distinguishes metallic and insulating behavior. In this way the 
statistical analysis of the fluctuations of the wave functions will be shown to 
yield a means of determining the metal-insulator transition which occurs with 
an increase in the disorder or the energy in 3D disordered samples. 



2 The Anderson Model of Localization 



To investigate the electronic states of disordered materials we use a lattice model 
with random potential energies as discussed in the introduction. Of course, this 
is a severe simplification. Amorphous materials and alloys do not show a regular 
lattice structure. However, the Anderson model [1] described below has become 
a paradigm and has been widely used in the study of locahzation properties. 

Starting from a general Hamiltonian H = /2m -f V and expressing the 

second derivative in the kinetic energy by finite differences, one obtains the 
following stationary Schrodinger equation in ID systems: 



2m 



A) — 2^(x) + il^{x — A) 



4- V (x) 'ifi(x) = E 'il){x) 



( 1 ) 



In the lattice model we are not interested in the behavior of the wave function 
on length scales smaller than a lattice constant. Thus the smallest distance A 
which can be used in (1) is the lattice constant a. For the wave function 'ipn at 
the position x = na^ i.e., at the nth site, one obtains the Schrddinger equation 
in site representation 






2ma^ 



('071+1 2071 . "b 0n— l) 4“ '4^n — E 'ijjfi , 



( 2 ) 
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which can be written in a more comprehensive way as 



t + t 'ipfi—i + €fi ij^ji — E 'ipji . (3) 

It should be noted that due to the underlying regular lattice the parameters 
t are independent of the lattice site. They are called transfer matrix elements 
because the respective matrix elements reflect the transfer of an electron from 
the nth site to a neighbouring site. As usual we express all energies in units of 
this parameter, i.e., we choose t = 1. 

Besides a constant term, the parameters contain the potential energies Vn- 
To simulate the random potential of the impurities these parameters are chosen 
independently from a box distribution 

P{en) = W-^ 0{W/2 - |e„|) . (4) 



Here 0 denotes the step function. The width W of this distribution reflects 
the strength of the disorder. Choosing a zero mean value of the distribution 
as in (4) Axes the origin of the energy scale. Of course, the box distribution 
means a severe simplification of the random potential. A binary distribution, 
e.g., would be more realistic for describing binary alloys. On the other hand 
if one takes into account that there may be several different features which 
contribute independently to the disorder, then the central limit theorem would 
suggest a Gaussian distribution. Due to its numerical simplicity, however, the 
box distribution has been used in most investigations of the Anderson model. 

The Schrddinger equation (3) can be solved as a recursion equation as in the 
chapter by Kramer et al. in this book. Here we solve the corresponding eigenvalue 
problem which can be written in matrix representation for a finite system with 
N sites as 

= Ei^Pi . (5) 

The diagonalization of the respective Hamiltonian matrix yields the eigen- 
values Ei and corresponding eigenvectors t/?i = 5 It is 

easy to construct the matrix from (3). For a chain of five sites it reads, e.g., 






/ 

1 

0 

0 






1 0 0 1 \ 

€2100 
1 C3 1 0 

0 1 €4 1 

0 0 1 €5 / 



( 6 ) 



where t = 1 was used as discussed above. The nonzero matrix elements 
and are due to the employed periodic boundary conditions, i.e., the chain 
is closed to a ring so that the first and last sites are neighbors. 

It is straightforward to generalize the above derivation for higher dimensions. 
In 2D samples the kinetic energy contains a second derivative with respect to the 
y direction. The respective finite differences lead to another two terms in (2) and 
(3) which reflect the transfer to the neighboring sites in the y direction. As an 
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Fig. la,b. Sites of a 5 x 5 square lattice numbered as appropriate for (7). The bound- 
aries of the unit cells are indicated by thick lines. The dashed and double- dashed sites 
reflect the copies of the system in the x and in the y direction, respectively, due to 
periodic boundary conditions in (a) and due to helical boundary conditions in (b) 

example we consider a 5 x 5 sample and count the sites as in Fig. la. Each row of 
this system can be represented by a 5 x 5 matrix automatically including 
the transfer due to the periodic boundary conditions in the x direction. These 
matrices yield the diagonal of the Hamiltonian matrix for the 2D sample which 
can be written in block from as 

0 0 0 0\ /OlOOlN 

0 000 10100 

=00 00 + 01010 . 

0 0 0 0 0 0 1 0 1 

\ 0 0 0 0 H(i)/ \1 0 0 1 0/ 

( 7 ) 

1 0 0 1 \ 

1 10 0 

= 01 1 0 
0 0 1 1 

V 1 0 0 1 / 

Here each 0 represents a 5 x 5 zero matrix and each 1 represents a 5 x 5 unity 
matrix. The latter describe the transfer to neighboring sites in the y direction as 
can be easily verified by inspection of Fig. la. The periodic boundary conditions 
in the y direction are reflected by the entries 1 in the lower left and upper right 
corner of the matrix in analogy to the above discussion of the respective 
matrix elements of 

The generalization to 3D samples can be performed in the same way putting 
five 25 x 25 matrices of the form (7) as blocks onto the diagonal of and 
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adding 0 and 1 matrices yielding 



= 



/H(2) 1 0 0 1 \ 

1 H(2) 10 0 

0 1 H(2) 1 0 

0 0 1 H(2) 1 

V 1 0 0 1 J 



(8) 



Now, however, these zero and unity matrices are of size 25 x 25. Altogether we 
ended with the 125 x 125 matrix (8) which is a sparse matrix, because only seven 
matrix elements in each row and column differ from zero. 



3 Calculation of the Eigenvectors 

The sparseness of the Hamiltonian matrix makes it ideally suitable for the ap- 
plication of the Lanczos algorithm [2], This algorithm has been widely used in 
recent years and shall not be described in detail here. It is sufficient to note 
that it is an iterative procedure which requires the repeated multiplication of 
the Hamiltonian matrix with some initial vector. These matrix-vector multipli- 
cations consume most of the computer time. Usually such a step would require 
multiplications and additions. Due to the particular shape of the matrix (8) 
it is easy to avoid all multiplications with 0 or 1 and all summations of zeros. 
Thus in each step only N multiplications and 6N additions are required. This 
makes the Lanczos algorithm very effective in the present case, not only as usual 
for extreme eigenvalues but also for eigenvalues in the middle of the spectrum, 
although in this case a large number of Lanczos iterations is necessary. 

In order to increase the efficiency of the matrix-vector multiplication even 
further, especially on vectorizing supercomputers, it has been most useful to 
slightly alter the boundary conditions. The periodic boundary conditions as dis- 
cussed above have to be programmed separately for each row of the 2D sample 
and for each 2D slice of the 3D system, requiring either a relatively large number 
of loops or conditional statements in the program. This can be avoided by em- 
ploying so-called helical boundary conditions. In 2D samples these are achieved 
by shifting the unit cell, which is periodically repeated in the x direction, by one 
lattice constant into the y direction. This is illustrated in Fig. lb. Now the 5th 
site is not a neighbor of site 1 but of site 6. This leads to a much simpler structure 
of the first matrix in (7). Instead of five 5x5 matrices it can now be written as 
one 25 x 25 matrix of the form (6). The second matrix in (7) remains unaltered. 
In order to achieve the same advantage in 3D systems, one has to shift the unit 
cells, which are periodically repeated in the y direction, by one lattice constant 
into the direction. The resulting matrix has a very simple structure which 
cannot only be programmed most easily in the matrix-vector multiplication but 
which becomes also most efficient on vectorizing computers as mentioned above. 
Physically the change of boundary conditions is of no relevance, because it only 
means replacing the originally cubic unit cell by an affinely distorted cell. 
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Fig. 2. Probability amplitude of eigenstates of the Anderson Hamiltonian on a square 
lattice of iV = 200 x 200 sites for energy E = 0 and disorder parameter W = 1. Every 
site with a probability larger than average is shown, i.e., \ > N Four different 

grey levels (j = 0, 1, 2, 3) distinguish whether > 2^ /N 



Employing the helical boundary conditions in this way it has been possible 
[3] to determine eigenstates of 3D samples with up to iV = 74^ sites in the 
middle of the spectrum, i.e., for eigenenergies close to E = 0. We note that the 
number of nonzero elements in each row and column of the Hamiltonian matrix 
remains seven in 3D and five in 2D also for the helical boundary conditions. Typ- 
ical wave functions for 2D systems are illustrated in Figs. 2 and 3. These plots 
demonstrate the fragmentation of the probability density of the wave function. 
It is a characteristic feature that the speckles which reflect lumps of high proba- 
bility occur on all possible scales and intensities and are separated by openings 
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Fig. 3. Same as Fig. 2, but for disorder W = b 



or gaps which likewise appear with all sizes. For larger disorder the probability is 
concentrated in certain areas. Mandelbrod [4] has coined the illustrative phrase 
curdling for this behavior. Strong curdling of the electronic wave function oc- 
curs when the electron is put into a system with too much disorder just in the 
same way as old milk curdles when it is poured into coffee which is too hot. As 
demonstrated in Figs. 2 and 3, the characteristics of the curdling depend on the 
disorder parameter.^ In the following sections a quantitative way of describing 
this behavior shall be presented. 

^ The data for these wave functions are included on the enclosed diskette together 
with programs for plotting these functions in the same way as in Figs. 2 and 3. In 
the World Wide Web (http://www.tu-chemnitz.de/home/schreiber/springer) data of 
more wave functions for other parameters can be found. 
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4 Description of Multifractal Objects 

The notion that the eigenstates of disordered systems may be fractal objects 
was first suggested by Aoki [5] who pointed out the possible scale invariance of 
eigenstates at the metal-insulator transition. Approaching the transition from 
the insulating side, the exponential decay length of wave function diverges. Simi- 
larly, on the metallic side the characteristic length, namely the coherence length, 
increases with increasing interference effects and also diverges at the transition. 
Thus at the metal-insulator transition there is no relevant length scale in the sys- 
tem (except the smallest length in the system, the lattice constant, which is of no 
importance in this context). This prompted the idea [5] that the respective wave 
functions are scale invariant and display self-similarity. A quantitative analysis 
of the dependence of the participation number on the system size corroborated 
the fractal behavior and yielded characteristic fractal dimensions not only at the 
metal-insulator transition but also in the neighborhood of the transition [6]. 

The fractality of the wave functions around the transition is an appealing 
concept, because it enables us to assume a continuous transition of the char- 
acteristics of the wave function at the metal-insulator transition. Moreover, it 
allows us to accommodate two seemingly contradictory notions. A localized state 
should occupy only an infinitesimal fraction of space even arbitrarily close to the 
metal-insulator transition, on the other hand an extended state is expected to 
be spread over the entire system. A filamentary structure over the whole sample 
like a mesh with openings on all scales or a curdled structure with lumps of all 
sizes could represent such an effectively extended state which nevertheless does 
not fill any finite fraction of the volume. This is exactly what happens in our 
disordered samples as illustrated in Figs. 2 and 3. 

However, it has become clear in recent years that the simple fractal de- 
scription cannot be sufficient to characterize the electronic states in disordered 
systems. Rather one has to take into account the spatial distribution of the prob- 
ability density of the wave function over the entire system in a quantitative way. 
This shall be explained in the following. 

The fractal picture as it is, e.g., used in Bunde’s chapter of this book, is 
based on the analysis of the spatial distribution of a set of points. A prominent 
example is given by the set of lattice sites in the Sierpinski gasket. Covering the 
set of points with squares of size (5^ in 2D or cubes of size in 3D systems 
requires M{S) boxes. Introducing generalized cubes of size 6^ as test boxes one 
can define a measure Md = Ylk ~ M,{6) 6^ by summing over all boxes k. This 
quantity allows us to measure the “mass” of the fractal set. The behavior of the 
measure 



Ma = M{S) ° (9) 

00 , a < IJo , 

defines the fractal dimension Dq of the set of points or lattice sites. In order to 
measure the finest details of the set, the boxes should become as small as possible, 
which in our case means that the ratio S of lattice constant and system size has 
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to become small. We note that the investigated set may be rigorously defined by 
a simple construction law like the Sierpinski gasket, which is exactly self-similar 
on all scales. The set of lattice points may also be defined in a statistical way as 
in the case of random walk trails or percolation clusters (cf. Bunde’s chapter of 
this book) yielding a stochastic fractal in which self-similarity is achieved only 
in a statistical sense. 

However, counting in our case all boxes in which the wave function is nonzero 
would mean counting all sites, because due to the random potential the wave 
function does not vanish at any site, although it may become exponentially 
small. Consequently the simple measure (9) includes all sites and yields the 
Euclidean dimension Dq = 2 for the square lattice and Dq = 3 for the cubic 
system. Thus Do is not a particularly useful quantity in our case. For a more 
interesting evaluation of the fractal properties we have to take the “contents” 
of each box into account. This means that we attribute an appropriate weight 
to each box. We consider the so-called box probability, i.e., the probability of 
finding an electron in a box of linear size La 

MfcW = \’>Pin\^ , k = l,...,NL ■ (10) 

n{k) 

Here the summation is performed over the or sites in the fcth box, i.e., 
Nl = N / or Nl = N / . This quantity, which is normalized because the wave 
function is normalized, thus measures the spatial distribution of the probability 
density of the wave function. Obviously, it can be a fractal object only in a 
statistical sense. 

The scaling of this measure and its gth moments in the limit of small box 
sizes S 



MM = 

k 



0 

oo 



d > T{q) 
d < T{q) 



( 11 ) 



defines the so-called mass exponents r{q). It is sometimes more illustrative to 
discuss the generalized dimensions 



D{q) = T{q)l{\-q) , (12) 

because for a homogeneous distribution of the mass, or in our case a homo- 
geneous distribution of the probability density of the wave function, all gener- 
alized dimensions equal the Euclidian dimension. If the measured quantity is 
distributed homogeneously over a fractal set, then all generalized dimensions 
equal the fractal dimension. But for a nonhomogeneous distribution, different 
fractal dimensions D{q) occur and we have a multifractal rather than a fractal 
object. It should be noted that for ^ = 0 we always have -D(O) = t( 0), and in this 
case the definitions (9) and (11) coincide yielding the fractal dimension Dq of 
the underlying lattice which supports the wave function. Therefore Do is called 
the dimension of the support or the similarity dimension, because it refiects the 
self-similarity of the support. 
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The physical meaning of r{q) and D{q) is that the different moments of the 
mass distinguish intertwined regions of the wave functions which scale in dif- 
ferent ways according to the mass exponents r{q). The generalized dimensions 
characterize the different fractality of the different subsets of the measure corre- 
sponding to the mentioned intertwined regions of the wave function. Accordingly 
the multifractal wave function is not really self-similar, but rather self-afRne, 
again in a statistical sense. 

Another quantity which is often used for a description of multifractal objects 
is the Lipschitz-Holder exponent a which reflects the strength of the singularity 
of the box probability in the kih box: 

fJikiS) - . (13) 

It describes how the mass in a particular box changes when the size of the box 
is reduced. Of course, for a homogeneous quantity one obtains a = Do- The set 
containing all boxes in which a particular value a of the singularity strength is 
observed is a fractal itself. Its fractal dimension / describes the scaling of the 
number of respective boxes 



Af{a) ~ 



(14) 



For a homogeneous quantity we obtaiii only one value f{Do) = Do- For a non- 
homogeneous quantity the so-called singularity spectrum f{a) results. Charac- 
terizing multifractal objects by the singularity spectrum f{a) is equivalent to 
the determination of the mass exponents r{q). In fact both quantities are con- 
nected by a Legendre transformation [7]. In particular, one can determine the 
singularity strength, 



a{q) = - 



dr(g) 

dg 



(15) 



and the singularity spectrum. 



f{a{q)) = qa{q) + r{q) , 



(16) 



in a parametrized way from the mass exponents. 



5 Multifractal Analysis of the Wave Functions 

For simplicity the following discussion is formulated for 2D systems. The general- 
ization to other dimensions is straightforward, but in 2D systems the proceedings 
can be illustrated more easily. 

For the multifractal analysis we cover the system with boxes of linear size 
5. Here the size is taken relative to the extent of the system, i.e., 6 = 

In our case, in which the wave functions do not vanish exactly at any site, this 
means we cover the entire system with squares of size Lx L, so that the number 
of boxes which are required is fixed a,s Nl = NfL‘^. For the so-called box- 
counting method we only have to count the number of boxes (cf. (9)). Of course 
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q 



Fig. 4. Mass exponents r(g) of the measure ^k{L) for various wave functions in a 
2D sample of iV = 200 x 200 sites at different values of the disorder W . The error 
bars reflect the accuracy of the linear regression. The dash-dotted line shows the mass 
exponents for a homogeneously extended wave function 



in our multifractal case we have to take the appropriate weights into account. 
For each box the probability of finding the electron in the box is calculated 
according to (10). According to (11) the sum of the gth moments over all boxes 
should be proportional to Thus the mass exponents can be determined by 

plotting Xlife Mfe on di doubly logarithmic scale and measuring the steepness of the 
curve for small 6. This can be easily performed by fitting a straight line to the 
logarithmic data using a standard procedure for linear regression [8]. Of course, 
such a routine will always yield some result. Whether this is a meaningful result 
has to be checked by ensuring that the data follow a linear behavior. In other 
words, it is necessary to check whether the qth moments of the box probability 
obey a scaling law at all. This test may seem obvious, but experience shows 
that it is sometimes forgotten. Obviously, deviations for the smallest possible 
size (the lattice constant, i.e., L = 1) and the largest possible size (the system 
size L = iV^/^) cannot be avoided. This makes the determination of the mass 
exponents a numerically delicate task. It is particularly difficult for large negative 
values of q for which the moments are dominated by the small amplitudes of 
the wave function which are most susceptible to numerical inaccuracies. This is 
the reason why we do not include data foiq< —2 in the following, 

Typical examples for the mass exponents are shown in Fig. 4 for different 
values of the strength of the disorder. With increasing disorder the deviation 
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Fig. 5. Generalized dimensions D{q) of the measure fik(L) for various wave functions 
in a 2D sample oi N = 200 x 200 sites at' different values of the disorder W. The 
error bars reflect the accuracy of the linear regression. The dash-dotted line shows the 
dimension of a homogeneously extended wave function 



from the straight line r{q) = 2 {1-q), which describes a homogeneous object in 
2D systems, become more pronounced. The linear behavior for large positive and 
negative values of the parameter g is a typical feature of multifractal objects. 
The values r{q = 0) = 2 and r{q = 1) = 0 coincide for all disorder values due to 
the 2D support and due to the normalization of the wave functions, respectively. 

The generalized dimensions can be easily calculated according to (12) from 
the mass exponents. Results are shown in Fig. 5. A homogeneous object would 
yield D{q) = 2 in 2D systems. With increasing disorder the curves in Fig. 5 
show stronger deviations from this constant. The overall shape of the curves is 
again typical for multifractal objects. The asymptotically constant behavior for 
g — )► dboo corresponds to the linear behavior of the mass exponents. For large 
positive and negative q the steepness of r(g) equals the minimal and maximal 
generalized dimensions and, according to (15), the minimal and maximal values 
of the singularity strength a, i.e., amin ^ D{q) < a^ax- These values correspond 
to the subset where the measure and thus the wave function is most concentrated 
or rarified, respectively. 

In principle, the singularity spectrum can be computed analogously. Replac- 
ing the exponent a in (13) by the coarse-grained Holder exponent 

ajt = log (5) /log (5 , (17) 
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Fig. 6. Singularity spectrum f{a) of the measure for various wave functions in a 

2D sample of N = 200 x 200 sites at different values of the disorder W. Integer values 
of the implicit parameter q are indicated by symbols for —4 < g < 10. The dash-dotted 
line is given by /(a) = a, which is a tangent of the singularity spectrum for every wave 
function 



which is equivalent to (13) in the limit <5 0, one attributes the singularity 

strength au to each box. Then one counts the number J\f {a) of boxes- which 
occur for each singularity strength or rather which yield a value of a within a 
small interval. The resulting histogram (hence the name histogram method) is 
used to determine the singularity spectrum according to (14) as 

f{a) = - log V(a)/ log S (18) 

in the limit J 0. Typical results are presented in Fig. 6. The displayed shape 
is again characteristic for a multifractal object. It is a convex curve with its 
maximum given by the fractal dimension of the support of the measure, i.e., 
max /(a) = Dq = 2. Around the maximum each curve can be approximated by a 
parabola. In the limit of large positive or negative q the singularity spectrum f{a) 
should approach zero with infinite slope. Due to numerical difficulties the curves 
do not reach the a axis. But the corresponding minimal and maximal values 
of a have already been derived above as the extreme values of the generalized 
dimensions. Accordingly the width of the singularity spectrum increases with 
increasing disorder as can be seen in Fig. 6. We also note in Fig. 6 that the 
maximum of the singularity spectrum moves towards larger values of a with 
increasing disorder. The dash-dotted line in Fig. 6 is of interest in so far as it 
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denotes the line /(a) = a which can be shown [9] to be a tangent of every 
singularity spectrum at /(a(l)) = a(l) = D{1). This is called the information 
or entropy dimension because it describes the scaling of the entropy S of the 
measure, which is defined as usual in statistical physics from the box probabilities 
as 

5 = - fiklnuk . (19) 

k 

It can be seen from (12) by expanding fil around q = 1 that 

D{1) = lim -S/lnS . (20) 

6-^0 

One can show that in the limit of infinite system size the entire measure is 
concentrated into a fractal set with this dimension. For completeness, we note 
that the dimension D{2) is called the correlation dimension, because it reflects 
the scaling of the density-density correlation function. Higher correlations are 
characterized by larger values of q. 

6 Computation of the Multifractal Characteristics 

The above-described method of computing the multifractal characteristics can 
be easily applied and can be programmed in a straightforward way. Two wave 
functions are supplied on the enclosed diskette. The reader is encouraged to 
calculate r(g),D(g), and /(a) as a simple exercise. However, the convergence is 
usually not very good and it is advisable to try to improve the accuracy. One way 
in which this can be done is by averaging over different choices of the division of 
the system into the Nl boxes. Let us consider a 6 x 6 unit cell as an example. 
There is a natural way of dividing it into four boxes of size 3x3. However, due to 
the periodic boundary conditions there are eight equally allowed divisions which 
one obtains by shifting the “natural” boxes one or two sites into the x direction 
and/or one or two sites into the y direction. This means that we can use 36 
values for fik instead of four values, thus improving the statistics significantly. 
Of course we have to take into account that every site is included nine times so 
that an appropriate normalization is necessary. 

Another important aspect of this consideration is that it also allows us to use 
box sizes for which Nl is not an integer, i.e., boxes which cannot be used in a 
straightforward way to cover the whole system without overlap or holes. Let us 
consider again as an example the 6x6 system which cannot be covered directly 
with 5x5 boxes. However, if we take into account all possible 36 nonequivalent 
positions for placing a 5 x 5 box in the periodically repeated system (shifting the 
box up to five sites in the x and/or in the y direction) we include every lattice 
site exactly 25 times. This yields the normalization in analogy to the previous 
example of 3 x 3 boxes. There is another way of visualizing this average. If one 
periodically repeats the original 6x6 system four times in the x direction and 
the resulting 30 x 6 strip four times in the y direction, the resulting 30 x 30 
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system can be divided into 5x5 boxes in a natural way. The 36 boxes thus 
obtained are equivalent to the 36 shifted boxes. In this way many more data 
points can be computed for the linear regression for the evaluation of (11) as 
well as for the calculation of (17) and (18), because all integer values of L are 
now allowed. 

If the number of data points is still not sufficient for a statistically reasonable 
result another improvement can be made by allowing slightly rectangular boxes, 
e.g., of size 2x3 and 3 x 2. In this way two more data points for an effective 
size L = 6^/^ are obtained. 

The reader is encouraged to use these tricks in order to improve the statistics 
of the simple exercise program. It is also interesting to note that the derivation 
of the singularity spectrum from the mass exponents according to (15) and (16) 
is not very accurate due to the necessary numerical derivative. This can be seen 
by comparing the numerically obtained results of (15) and (16) with the results 
of (18). 

For a more accurate computation of the singularity spectrum it turned out 
to be better to use the gth moment of the measure in the separate boxes which, 
properly normalized. 



Nl 

lA'iL) , ( 21 ) 

constitutes a measure itself. In the method of moments this measure is used to 
directly calculate the Lipschitz-H51der exponents 

o:(g) = lim 'Y]nk{q,L)\ogiJLk{l,L)l\og5 (22) 

k 

and the corresponding value of the singularity spectrum 

/ (g) = lim ^ Hk {q, L) log Hk {q, L) / log 5 (23) 

k 

in a parametric representation. These formulae can be evaluated by a standard 
linear regression, determining the steepness of /xlog/i versus log (5. Of course, 
the above-explained tricks to improve the accuracy by averaging over different 
nonequivalent divisions of the system into boxes, by considering box sizes which 
do not naturally fit into the system size, and by taking rectangular boxes into 
account can all be applied for the evaluation of (22) and (23), too. 

In fact, the data in Fig. 6 have been obtained in this way. The mass exponents 
can then be easily calculated from (16) and the generalized dimensions from (12), 
yielding Figs. 4 and 5. These data are also available on the enclosed diskette, 
together with plot programs to display the data in different combinations. We 
have also added data for different system sizes and for different values of the 
random potential energies. Comparing these data in the plots, the finite-size 
effects and the dependence on the actual values of the random numbers can be 
estimated. 
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In actual applications it is necessary to average over several realizations of the 
random potential. It is useful to note one more hint in this case: it is advantageous 
to average the sums over jalogfi in (22) and (23) and to perform the linear 
regression afterwards for the averaged data instead of averaging the results of the 
linear regression for different realizations of the random potential. Usually one 
also performs an average over several eigenstates which are close in energy. This 
average is numerically less expensive, because the eigenstates can be computed 
in one run of the Lanczos procedure. Nevertheless, to obtain sufficiently accurate 
values for reasonably large systems an unpleasantly large amount of computer 
time is necessary. 

7 Topical Results of the Multifractal Analysis 

Following Aoki’s suggestion [5] that the eigenstates of disordered systems should 
be scale invariant at the metal-insulator transition, the fractal character of the 
electronic wave functions of the Anderson Hamiltonian has been demonstrated 
by showing the power-law dependence of the participation number on the sys- 
tem size [6,10]. The respective fractal dimension coincides with the correlation 
dimension -D(2), which has also been derived directly from an analysis of the 
density-density correlation of the wave function [11]. Other generalized dimen- 
sions have been computed at the metal-insulator transition in 3D systems [12,13] 
as well as for the critical states in the lowest Landau band of a 2D system under 
the influence of a strong magnetic field [14] and in 2D systems with spin-orbit 
coupling [15]. These systems display a metal-insulator transition in 2D, too, and 
can be investigated in terms of the Schrddinger equation (3) with appropriately 
chosen transfer integral t. 

When it became clear that the simple fractal picture was not suflScient, it 
was found that the multifractal character of the electronic states at the metal- 
insulator transition could be established for the 3D Anderson model [16] as well 
as for the 2D model with strong magnetic field [17]. It is interesting to note that 
the resulting singularity spectrum for the 2D tight-binding system coincides with 
that for the continuum model of 2D samples with magnetic field [18]. Likewise, 
the singularity spectrum of the 3D model (3) with Gaussian distribution of the 
random potential of the impurities was found [19] to be the same as for the model 
with box distribution (4) of the potential energies when both models are eval- 
uated at the metal-insulator transition which is known [20] in the band centre, 
i.e., for E = 0, from respective studies by means of the transfer-matrix method 
(cf. the chapter of Kramer et al. in this book) to occur at Wc = 16.5 (21.0) for 
the box (Gaussian) distribution. These observations stimulated the conclusion 
[19] that a characteristic singularity spectrum is universal for the metal-insulator 
transition regardless of the specific parameters of the model (except the dimen- 
sionality). Around its maximum the characteristic singularity spectrum in 3D 
was also shown to agree with the parabolic approximation 

fia) = 3 - (4 - af/4 , 



(24) 
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which was obtained from the 2 -h e expansion within the nonlinear a model [21]. 

Taking into account the pronounced dependence of the singularity spectrum 
on the disorder that can be seen in Fig. 6, the comparison with the charac- 
teristic singularity spectrum for the transition can then be used as a tool [22] 
to distinguish localized and extended states. As it is impractical to compare 
the entire singularity spectrum, two specific points of the spectrum have been 
used for this purpose, namely a{0) = max f{a) and a(l) = /(a(l)) = D{1). 
Both show a pronounced dependence on the system parameters: o;(0) increases, 
q;( 1) decreases with increasing disorder and/or energy. Taking only the transfer- 
matrix-method value of the critical disorder {Wc = 16.5) in the band centre 
{E = 0) as input parameter, the critical values at the metal-insulator tran- 
sition have been determined as ac(0) = 4 and adl) = 2. These values were 
then employed to distinguish localized and extended wave functions for various 
parameter combinations of energy and disorder. As a result the phase diagram 
in the energy-disorder parameter space could be determined for the box, the 
Gaussian, and the binary distribution of random potential energies [20]. For all 
distributions the entire trajectory of the metal-insulator transition was in good 
agreement with respective results from the transfer-matrix method. Already for 
very small systems with 20^ = 8000 sites the accuracy of the multifractal anal- 
ysis is sufficient to obtain the phase boundary, whereas the system size for the 
mentioned calculations by means of the transfer-matrix method was three orders 
of magnitude larger. 

For the relatively small systems it is not surprising that multifractal charac- 
teristics can also be derived for wave functions not exactly at the metal-insulator 
transition but in its neighborhood. In this case, however, the wave functions 
should display scale invariance not on all scales but only up to the correlation 
length or the localization length. Close to the transition, these lengths are much 
larger than the size of the investigated systems so that the multifractal analysis 
is feasible. As a consequence, however, the singularity spectrum shows a specific 
dependence on the size N of the system. With increasing system size the singu- 
larity spectrum for the extended wave functions becomes narrower and narrower 
in agreement with the expectation that for infinite systems the singularity spec- 
trum f{a) of an extended state would consist only of one point f{Do) = Dq. On 
the other hand, for localized states the singularity spectrum would consist of two 
points: /(O) = 0 would reflect the localization centre, whereas /(oo) = Dq would 
represent the large part of the system in which the wave function is effectively 
zero. Accordingly, in the regime of localized states the singularity spectrum be- 
comes broader and broader with increasing system size. Only exactly at the 
metal-insulator transition does the singularity spectrum not change upon al- 
tering the system size. Thus the analysis of the dependence of the singularity 
spectrum on the system size enables us to distinguish localized and extended 
states. Again it is impractical to analyze the entire singularity spectrum. In- 
stead we have again evaluated the data for a(0) and a(l). The results agree 
with the above-described evaluation of the disorder and/or energy dependence 
of the singularity spectrum [20]. However, to obtain sufficiently accurate data for 
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the system-size dependence of the singularity spectrum a much larger numerical 
expense is necessary. Therefore this method is impractical for a determination 
of the complete phase boundary. But it is conceptually satisfying, because it 
yields a possibility of determining the characteristic singularity spectrum for 
the metal-insulator transition without relying on input data from the transfer- 
matrix method. 

The discussed investigations have been concerned with the multifractal prop- 
erties of single eigenstates. Transport, however, is usually not connected with 
single eigenstates, but is rather concerned with wave packets traveling through 
the system. Such a wave packet can be constructed by linear combination from 
a large number of eigenstates within a given energy range. A priori, it is by no 
means clear that such a linear combination maintains the multifractal proper- 
ties. Looking at Figs. 2 and 3, one can easily imagine that the sum of several 
wave functions, the curdling of which occurs at different positions in the system, 
yields a less fragmented picture. (It is of course a significant difference whether 
one linearly combines the wave functions before calculating a box probability in 
(10) or whether one averages the results of the singularity spectrum over several 
different wave functions as discussed in Sect. 6.) A detailed analysis, however, has 
shown [23] that a linear combination of a large number of wave functions does 
not give a homogeneous distribution of the probability amplitude, but rather 
yields the same singularity spectrum and consequently the same set of general- 
ized dimension as the original wave functions. Therefore the characteristics of a 
single eigenstate are sufficient to analyze the influence of multifractal properties 
on the transport properties of electrons in disordered systems [23]. 

In conclusion, we have shown that the multifractal analysis of the wave func- 
tions yields a straightforward and independent way of determining the trajectory 
of the metal-insulator transition, i.e., the phase boundary in the energy /disorder 
diagram. The large-scale numerical investigation was concentrated on two points 
q:( 0) and a(l) of the singularity spectrum. In principle one can apply the method 
to any other point a{q) or f{q) but the numerical accuracy will be more difficult 
to achieve for large g, especially for large negative values of q. 

Conceptually the multifractal method is an appealing concept, because it 
exploits profitably the strong fluctuations of the wave functions which hitherto 
were considered a nuisance in the numerical investigations. Prom a physical point 
of view it is interesting that these fluctuations, which are evaluated on relatively 
short scales, can be successfully used to determine the metal-insulator transition 
which reflects the long-range behavior of the wave functions. The reason for this 
unexpected success is of course the scale invariance of the wave functions at the 
metal-insulator transition, where the characteristic fluctuations are independent 
of the system size and therefore show up in small systems in a characteristic way. 
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Abstract. In this chapter we introduce recursive scaling methods for the calculation 
of electronic properties of disordered systems. First we introduce in some detail the 
basic physical concept by considering the one-dimensional limit. Then, we describe 
the recursive procedure and demonstrate the finite-size scaling method for analyzing 
the data. Finally, we briefly describe the status of the results for the disorder-induced 
metal-insulator transition. 



1 Introduction 

Real solids, such as metals, alloys, glasses, and doped semiconductors, always 
contain a certain degree of disorder induced by impurities, defects, and/or dis- 
locations. The arrangement of the atomic constituents is far from being ideally 
ordered. Their physical properties are therefore to a considerable extent deter- 
mined by “randomness” . The understanding of the latter is of immense practical 
importance. Detailed knowledge of material properties in general, the properties 
of thin films, especially electrical and magnetic, is an unavoidable ingredient 
of everyday technology in practically all areas of our modern civilization. Its 
most recent achievement, communication technology, is hard to imagine without 
understanding the transport behavior of electrons in the disordered potential 
landscape of doped semiconductors. 

Methods that are borrowed from the theory of the random spectra of heavy 
atomic nuclei can also be used to treat certain questions of a fundamental nature 
in connection with the more or less random electronic spectra of heavy atoms 
and large molecules, and of noninteracting disordered systems. As all of these are 
basically dominated by the laws of quantum physics, the motion of a particle in 
a random potential can be considered as one of the paradigmatic quantum prob- 
lems which is both of outstanding practical and fundamental importance. The 
most striking example is an electron moving in a random potential. The under- 
standing of its quantum properties yields not only the key to the understanding 
of the metallic and insulating nature of matter. It is also a crucial ingredient for 
modern electronic devices. In this chapter we concentrate on this example. We 
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will especially deal with the phenomenon of localization which is closely related 
to the electronic transport properties [1]. 

Essentially, we have to solve the stationary Schrddinger equation for a single 
particle, 

^(x) = E . (1) 

Here, V (x) is the random potential energy, m the (effective) mass of the particle, 
and E the energy of the stationary wave function ^/^(x). For numerical purposes, 
the lattice equivalent of (1) is more convenient, 

^ ^ I'nn’ “t“ £n '0n — E ipn • (2) 

n' 

As discussed in this volume in the chapter by Schreiber, the first term reflects 
the kinetic energy in (1), the second comprises the potential energy. The matrix 
elements tnn' describe the transitions of the particle between the lattice sites, e-n 
are the site energies. For the sake of simplicity, we assume in the following that 
randomness is incorporated only in the latter. They are chosen independently 
at random according to some distribution p{e) with a vanishing mean and a 
variance 

Qualitatively, the properties of such a model can easily be understood. When 
the variance of the potential vanishes the Schrodinger equation describes an 
ordered lattice. The energy eigenvalues form bands of finite widths, the corre- 
sponding eigenstates are Bloch states. They extend throughout the whole infinite 
lattice. An electron in such a state - having an energy E and a momentum k 
- is spread across the infinite system. Such an electron will be highly mobile, 
and contribute to the transport of charge. For nonzero fluctuations of the po- 
tential, the bands of energy eigenvalues are broadened. Near the band edges, 
states are formed which are spatially localized within finite regions of accidental 
potential wells. Electrons in these localized states are spatially confined. They 
“cannot move” and do not contribute to charge transport. One of the questions 
to ask in this context is what the precise conditions are under which localized 
and extended states occur in a random system. Another one is how physical 
quantities behave near the localization-delocalization transition. The theory of 
localization deals with these questions. The recursive numerical scaling methods 
to be described below can provide quantitative answers. 

In the next section we treat the example of a one-dimensional (ID) disordered 
lattice model in some detail [2, 3, 4]. The results provide the basic ingredients 
for the numerical procedures. 

2 One-Dimensional Systems 

The quantum mechanics of an electron in a ID random potential constitutes the 
paradigm of localization theory. It can be shown in a mathematically rigorous 
way that all eigenstates of the Hamiltonian are exponentially localized in the 
thermodynamic limit. 
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2.1 The Transfer Matrix 

For simplicity, we assume that the kinetic energy term in (2) contains only 
hopping processes between nearest neighbors. We choose t = 1, fixing the energy 
scale. Then the Schrodinger equation may be rewritten as 



V'n+l ={E- £„)V’n - '0n-l • (3) 

Together with 'ijjn = 'ipn this can be cast into the matrix notation 

The random matrices T„ transfer amplitudes between the sites n, n — 1 and 
n + l,n. Equation (4) may be solved recursively for arbitrary initial conditions 
'01 and 00 • The amplitudes at iV + 1, iV are 

The calculation of the amplitudes is thus equivalent to the calculation of the 
product of random matrices, 



N 

TN^Y[Tr, . ( 6 ) 

n=l 

The important point now is that there exists a mathematical theorem due to 
Oseledec [5] which guarantees the existence of the product in the thermodynamic 
limit, 

with eigenvalues e”^”" < oo (m = 0, 1) where are the Lyapunov exponents. 
Physically, the theorem of Oseledec means that when starting from one end of 
the system, the amplitudes can be written as a superposition of exponentially 
increasing and decreasing functions, since 70 = —71 due to the symplecticity^' 
of the matrix (6). For definiteness we select the label so that 70 < 0 < 71. For 
large distances from the origin, the behavior of the amplitude will be dominated 
by the exponentially increasing component. 

We will now argue that 71 determines the exponential decay of the eigenstates 
such that the localization lengths may be defined by 

A = - 7 ^^ = 7 f ^ . ( 8 ) 

The exponentially increasing states obtained from the transfer-matrix equation 
for an arbitrary energy are in general not eigenstates. The latter may in principle 

^ A matrix A is called symplectic if AJA^ = J with J = where 0 and 1 are 

zero and unity matrices. Then the eigenvalues of AA^ occur in pairs a-m and a“^. 
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be constructed by starting the iteration (4) from both ends of the system, and 
matching the wave function and its derivative continuously at some site within 
the bulk. The eigenvalues Ei of the Schrodinger equation are given by those 
energies for which such a continuous matching is successful. It follows that the 
localization lengths of the eigenfunctions ^0^ = are 

given by 

In the limit of infinite system length the eigenvalues are densely distributed. 
Their mean distance decreases as N~^. Therefore, it does not make any sense to 
consider individual eigenstates. Eventually, it will be the average of the localiza- 
tion lengths within a certain energy interval AE which determines the transport 
properties, 

i 

It is one of the key assumptions of localization theory that this average can be 
calculated by averaging over an ensemble of macroscopically equivalent 

systems. In ID, this hypothesis of ergodicity for the localization length can be 
proven. 



2.2 The Ordered Limit 



In order to demonstrate how the transfer-matrix equation works, we will now 
consider a trivial example, namely the limit of an ordered system (a = 0) with 
periodic boundary condition, '^iv+i = '0i* Unfortunately, 

T„(E)= =T (10) 

is not diagonal. Diagonalization of the transfer matrix T yields the eigenvalues 



^^ 1,2 




( 11 ) 



with some unitary matrix U which contains the corresponding eigenvectors. 
Then 

r„ = u(f , (12) 

and after multiplication with U“^ from the left one obtains for the amplitudes 



with 



Therefore 



f ajv+i \ _ yj-i / V'iv+i A 

[a^ J - ^ J ■ 



N _ Ny 
^N+1 = ai = aie ^ 



(14) 

(15) 
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with 7 = log {E/2 + yjE^lA-l). 

For \E\ >2 there is no solution possible which fulfills the above periodic 
boundary condition. When |FJ| < 2, then 7 becomes complex, with Re 7 = 0 and 
Im 7 = arctg(\/4 — E‘^ jE). Then 

aN+i = CLi exp ^iiVarctg(\/ 4 — E‘^/E)^ = aie^^^ . (16) 

This defines the energy eigenvalues 

E{k) = 2 cos k , (17) 

which correspond to the Bloch states 'ipn{k) oc e^^^. On the other hand, be- 
cause of the boundary condition, ikN = 2mK^ such that k = 27tk/N with 
K = 0, ±1, ±2, ... . The “allowed” range of k values is therefore given by the ID 
Brillouin zone — tt < fc < tt. 



2.3 The Localization Length 

Now, we consider the case of a disordered chain, i.e. a ^ 0. We cannot, obviously, 
apply the same method as in the above example, since the eigenvalues 



- 

a, 2 ~ 



E 






(18) 



depend on the lattice site, as do also the unitary matrices U = JJn • How can we 
proceed in this case? 

Since we are interested in the average exponential increa se (or decrease) of 
the wave function, it is natural to ask for the dependence of |^ivp on N-. Here, 
denotes the average with respect to the realizations of the disorder, 



A = 




den p{^n) -4(ei •••6iv) 



(19) 



where we make use of the assumption that the distributions p{€n) of the site 
energies do not depened on the other sites as mentioned in the introduction. 

For a localized state, the modulus of the amplitude should decrease expo- 
nentially, or equivalently - if is not an eigenstate - increase exponentially. 
Formally, it is necessary to consider 



= mo) 



( 20 ) 



in order to avoid the effect of accidental vanishing of the amplitude on two 
successive sites. We use the initial conditions '0i = 1, 'i/^o = 0- Then 



(21) 
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The task is now to compute this matrix element as a function of N. First, we 
note that 

T^Tn = T\f},_,fN-iTi . ( 22 ) 

For convenience we have slightly changed the notation here, and have abbrevi- 
ated TArTjv-i-“T 2 = Tn- 1 - This yields a recursive equation for the matrix 

elements Ti,n{N) of T^Tn- For E* = 0, 

Til (AT) := 4Tn{N - 1) - - 1) + Ti2(iV - 1)] + T22{N - 1) , 

T22{N) = Tu{N-1) , 

(23) 

Ti2{N) = elTuiN - 1 ) - T2i{N - 1 ) , 



T2i{N) = m^niN - 1) - Ti2(AT - 1) , 

since the random matrix elements Ty^{N — 1) are statistically independent of 
El. For Til (AT), this yields 

Tn {N) = ifrn (AT - 1) + Tn (iV - 2) , (24) 

because the average of ei vanishes. One obtains eventually, by rearranging suit- 
ably, _ 

When the disorder is small, such that the localization length is large (A 1), 
we can replace the finite difference A by the total differential 

2 

dlogTn(AT) « ydAT , (a ^ 0) , (26) 

where = ^i- By integration and exponentiation, this yields 

Tii(AT) = Tii(0)e"^'^/^ = e^^/^ . (27) 

On average, the wave function increases exponentially with N . 

For a box distribution of the random site energies with width W we have 
cr^ = W^/12, such that the localization length 




(28) 



is finite for any finite disorder, and diverges at TF = 0 with a critical exponent 
1 / = 2. The prefactor in this expression, when calculated more carefully, turns out 
to be 105 [8], instead of 48. This is a consequence of pecularities in the statistics 
of the localization properties [9], and a singularity at E = 0 of the model [10] 
that have to be treated more carefully than the somewhat hand- waving argument 
used above. 
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2.4 Resolvent Method 



We will now discuss a recursive method which is slightly more general than the 
above transfer-matrix method. It allows us not only to calculate the localization 
length, but also other electronic properties, i.e., the density of energy levels and 
the conductivity. 

The solutions of the Schrodinger equation, Ei, [i), are contained in the resol- 
vent operator G{z) [11] defined by 

G{z)iz-H) = l, (29) 



where z = E ^-ir) is the complex energy variable. The spectral representation of 
the resolvent is 

,30) 



In the representation of the site states. 



. Z H/i 



(31) 



is the (time-integrated) probability amplitude for an electron with energy js to 
hop from the site j to the site k. 

The energy eigenvalues of H correspond to the poles of G while the residues 
are given by the projection operators onto the basis states. By decomposing 
H = Ho -f- V and expanding it in powers of V/{z — Hq), we obtain the resolvent 
equation 

G{z) = Go{z) + Go{z)VG{z) , (32) 

with the resolvent of the unperturbed system, Go(^) = {z — Hq)~^ • 

If all of the wave functions with energies near E are localized, say, {j\i) = 
fi{j)exp{—\j — ji\/X{Ei)) with a random phase factor fi{j) and localization 
centre ji, then the transition probability between site 1 and site N of the system 
will be exponentially decreasing, 

^ (33) 



The localization length may thus be obtained from the limiting behavior of Giiv* 
In order to calculate Gin we decompose the Hamiltonian for the system 
consisting of N sites, 



+ 6iv|iV)(iV| + V\N){N - 1| + V^N - 1){N\ 



= Ho + Hi , 



(34) 



with Hi = V\N){N — 1| + V^N — l)(iV|) comprising the coupling between the 
(iV-l)th and the iVth site. The corresponding resolvent equation 



rfN 

^jk 



- ^Ojk 






N jrN r^N 
Ojf ^j'k'^^k'k 



'k' 



(35) 
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with 



.N-r<N-i , \N){N\ 






Z-£n 

may be cast into a set of recursive equations 

^N—1 I /oiV— 1 



(36) 



G N /^N—1 i//^N 

jN — ^jN-l^^NN ? 



G^, = (z-£Nr^V^G^, , 



(37) 



^NN = — 

This may be solved for = G^iviv? 

1 



9n = 



z - €n -V^qn-iV 



(38) 



This is the key equation for the recursive calculation of electronic properties. 
We demonstrate this for two examples. It is easy to see from (37) that 

N 

G^n = G^^l.Vg^ = 51 n , (39) 

J=2 

such that we obtain for the localization length 

KE) = lim (40) 

iV-^cxD ) 

the recursive equation 

~ IsnI (41) 

or 

= ■ («) 

This is completely equivalent to our earlier definition,^ cf. (7,20). Although the 
above recursive procedure will lead to a numerical problem, since after relatively 
few iterations Guv becomes exponentially small, ^ it can nevertheless be suc- 
cessfully applied even to very large systems N = G(10^°), since the theorem of 
Oseledec guarantees convergence, and can even be used to construct a prescrip- 
tion of how to estimate the statistical accuracy of the result as a function of the 
length of the system [7]. 

^ As an exercise, show that = Gf^. For a solution, see [7]. 

^ How can the numerical instability related with the exponential decrease of Guv be 
avoided? For a solution, see [7]. 




174 Bernhard Kramer and Michael Schreiber 



A similar, though considerably more complicated procedure may be con- 
structed for the average linear conductivity as given by the Kubo formula [12], 

which can be cast into the form [1] 



aN{E,W,ri) = - 



4e^ 

TThiV 



Y^3kG%{GkjT + 

jk j 



(44) 



by using homogeneity of the system, i.e., that the configurational average of the 
absolute square of the Green’s function at the right hand side of (43) depends 
only on the distance between j and k. 

Using (37) we obtain the nonlinear set of recursive equations for the conduc- 
tivity 

4e^ 

with 

SN = Siv -1 + Re {qn [bN-i - - 2r]glfdN-iN + {ir} - rfg*,^)N‘^]) , 



d-N = lON^idN-i +vN) 



(46) 



bN = g% [&JV -1 — 2glf\dN-if + {ig — 2r]‘^g^)N^ — igdj^^ig^N] . 

This can be used for a numerical determination of the conductivity. The results 
for ^ = 0 are consistent with a scaling law [ 12 ] 

<,(E = o,ir,,) = i<,(oi,i) . (47) 

which can be used to extrapolate the conductivity to its value for 77 — )■ 0 from 
the numerically obtained data, 

lim cr(0, W, 77 ) a lim 77 = 0 . (48) 

7 ?— fO 77->0 

If we identify oc A”^, 77 oc oc where r^p and i^p are a “phase- 

coherence time” and the corresponding “phase-coherence length” , respectively, 
we obtain 

a(0,A,^^)=Aa(^0,l,^) . (49) 

When the temperature goes to zero, the phase-coherence length will extrapolate 
to infinity, and the conductivity to zero, since phase-breaking processes, such as 
electron-phonon scattering, will be frozen out. 
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3 Finite-Size Scaling 

Both of the above methods can be generalized to dimensions d> 1, by consider- 
ing quasi-lD systems of cross-sectional area and length A/'(> M). One has 
to replace e/v, F, and qn by matrices of dimension The transfer-matrix 

equation (4) becomes a set of 2M^~^ equations. The set (37) becomes then a set 
of recursive matrix equations of the dimension M^~^. As a result, one obtains 
localization lengths which correspond to the Lyapunov exponents.^ The 
smallest of them defines the localization length 

Am = — : i — I , (50) 

liXlIifYi—l _ M I'Tml 

since it eventually determines the exponential decay of the wave functions. 

Being interested in the d-dimensional infinite system we have to extrapolate 
the localization length for M -> oo, a task which is not so easy to perform, given 
the usual computational restrictions. Using the above procedures, one can, for 
instance, determine data for M smaller than, say, 20 in 3D, at best. 

It is therefore imperative to search for scaling laws to be fulfilled by the data, 
in order to make reliable extrapolations possible. We have already encountered 
an example of such a scaling law in the previous section. 

As another example, let us consider the classical conductance of a metallic 
cube of the size at zero temperature. The residual conductivity a is in this case 
a finite material parameter that characterizes the electrical transport behavior. 
The conductance is 

g{L, W) = , (51) 

where a depends on the disorder, a oc W~"^. Depending on the dimensionality, 
the conductance behaves therefore according to a one-parameter scaling law 

g(L,a)=fM0 , ( 52 ) 

with ^ For insulators, 

g{L, W) oc IGiivp oc = /o(L/A) . (53) 

In the following we want to generalize this one-parameter scaling idea. Let 
a(L) = (ai(L),a 2 (L), • • • ,um(L)) be a vector with components that describe 
certain properties of the system. The dimension M of this vector can also be 
infinite, and instead of the vector we can also consider a (continuous) function 
a(/x,I/). How does a behave under scale transformations 

V = bL , (54) 

^ Since the transfer matrices are symplectic, the Lyapunov exponents come in pairs, 
jrn and —7m- Therefore, only of the 2M^~^ exponents are significant. They 

depend on the cross-sectional diameter of the system, and can always be arranged 
such that 0 < 7m < 7m+i- 
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with b a real number? Then we can certainly assume the general relation 

a(6L) = F(a(L),6) . (55) 



More restrictive is the following assumption, 



da 

dlogl/ 



f(a(L)) 



lim 

AL-^O 



F(a(L),l + ZiL/L)-F(a(L),l) 
AL/L 



(56) 



Let us further assume that there exists a fixpoint f(a*) = 0, and a « a*. By 
expanding into powers of a — a*, one obtains a first-order linear differential 
equation for a{L) which can be solved, 



a(L) = a* + ^ 



m 




(57) 



Here, f' • (pm = with the Jacobian matrix f' of the partial derivatives of 

f, and its eigenvalues and eigenvectors (pm- On the other hand, wheii a is a 
function of an additional parameter, say a(L) = a.{L,W), one can expand 



sl{L,W) = b* + 



da(L, W) 



dW 



{W - W*) 



w* 



By comparing (57) and (58), one obtains 



U(W) 



(IF - IF*)!//;, ’ 



(58) 



(59) 



such that 

a(L, IF) = a* + ^ ipmiW - W*) . (60) 

m 

For large L the largest positive dominates the behavior near the fixpoint. The 
components of a that correspond to positive (negative) are denoted relevant 
(irrelevant) scaling variables. The largest f!^ defines the critical exponent., v = 
l/max/;„. 

Let us consider two instructive examples. For the above-mentioned conduc- 
tance we can define the /3 function 



dlogg 

dlogL 



= P{logQ) . 



( 61 ) 



In the metallic limit /? = d — 2, whereas in the insulating region p = log Q. 
Assuming that P is continuous and monotonically increasing, and recalling that 
in the metallic regime the maximally crossed diagrams yield a negative quantum 
correction to the conductance [13], one arrives at the conclusion that a disorder- 
induced metal-insulator transition (the so-called Anderson transition) can only 
occur in 3D. For d < 2 no genuine metallic behavior can exist if interaction 
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effects are ignored. The critical exponent is given by the inverse of the derivative 
of /3 at the critical point defined by ^(}ogQ*) — 0. 

As mentioned above, the set of Lyapunov exponents 7m (Af, W) deter- 

mines the behavior of the states of a quasi-lD system. We define renormalized 

dimensionless Lyapunov exponents by 7m = If 7i oo, the system 

is insulating, if 71 0 then it is metallic. The critical point W* is given by 

limji > 0 . 

4 Numerical Evaluation of the Anderson Transition 

4.1 Localization Length of Quasi-ID Systems 

As an example for the application of the methods described in the previous 
section we consider a strip of M x N sites. If ^ M, this a quasi- ID system. 
The length N will be increased in the calculation until a preset accuracy is 
achieved. The width M has to be relatively small due to the available computer 
power. But as mentioned in the previous section an extrapolation towards infinite 
width will be possible by finite-size scaling so that 2D systems can be described. 

As in the ID case the Schrodinger equation (2) can be rewritten as a recur- 
sion. In analogy to (4) it reads 

($;«)= ■;)($:_,) ($:.,) ■ 

But in this case the transfer matrices T„ transfer the amplitudes of the wave 
function between all the M sites of the cross-sectional chains n,n — 1 and n+l,n. 
There are M linearly independent ways to distribute the amplitude of the initial 
state on the M sites of the first chain. These possibilities can be simultaneously 
treated if the respective M vectors of expansion coefficients are comprised in an 
amplitude matrix of size M x M. The initial condition can then be expressed 
by -0^ = 1 and -0Q = 0, where 1 and 0 are M x M unity and zero matrices, 
respectively. 

The 2M X 2M transfer matrix in (62) is the respective generalization of 
the transfer matrix in (4): the unity matrices in T„ transfer the particle (or rather 
amplitude of its wave function) between neighboring chains. The Hamiltonian 
matrix comprises the random site energies £nm of fho sites m = 1, ..., M in 
the nth chain as well as the matrix elements t{= 1) describing the transition of 
the particle within the chain. is given by 

f£nl 1 0 0 1 \ 

1 6n2 1 0 0 

B.n= 0 1 Sn3 1 0 

0 0 1 £n4 1 

\ 1 0 0 1 £nj 



( 63 ) 
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for a chain of M = 5 sites. Here the nonzero matrix elements in the upper right 
and lower left corner of the matrix reflect the periodic boundary condition. We 
note that (63) is exactly equivalent to the Hamiltonian matrix for a ID system 
as constructed in Schreiber’s chapter of this book. 

As in the ID case, the product matrix Tn of the transfer matrices satis- 
fles Oseledec’s theorem, i.e., the limiting matrix F as given by (7) exists. Since 
Tn and thus Tn are symplectic matrices the eigenvalues of F occur pairwise as 
e“^^ and where jm{> 0) and —'jm are the Lyapunov exponents describing 
the exponentially increasing and decreasing amplitudes along the quasi- ID sys- 
tem. These Lyapunov exponents characterize how the amplitudes of the initial 
states which are described by the expansion coefficients in “drift apart” ex- 
ponentially. The inverse values reflect the different characteristic length scales. 
The largest length (i.e., the inverse of the smallest positive Lyapunov exponent) 
describes the weakest possible decay of the transmission probability along the 
quasi- ID system for a state at a given energy. This length is commonly associ- 
ated with the localization length, implicitly assuming that the electronic states 
are exponentially localized. We shall denote this length by Am = 7f ^(M). 

In principle the eigenvalues can be determined by the repeated multiplication 
of the transfer matrices onto an arbitrary initial vector. This product converges 
towards the largest eigenvalue of Tn times its eigenvector. That eigenvalue, how- 
ever, reflects the fastest exponential increase. But as the localization length Am 
is given by the slowest exponential increase, we have to determine that eigen- 
value of Tn which is closest to unity, or, equivalently, the smallest (positive) 
Lyapunov exponent 71 (M). This can be achieved by the repeated multiplication 
of the transfer matrices T„ onto M orthogonal initial amplitude vectors, which 
will eventually provide all the Lyapunov exponents if the orthogonalization of 
the amplitude vectors is maintained. 

This determination of the Lyapunov exponents from the asymptotic behavior 
of the recursion (4) is conceptually slightly different from the diagonalization of 
the product matrix (6). However, due to the initial values = 1 and = 
0 the left-hand side of the 2D analogon of (5) is identical with the first M 
columns of the product matrix. Due to the symplecticity of the product matrix 
the knowledge of these columns is sufficient for the determination of all Lyapunov 
exponents. 

But the numerical problem is that the ratio of the smallest to the largest 
eigenvalue of Tn becomes comparable with the machine accuracy after very few 
multiplications. This means that the smallest eigenvalue would be lost very soon. 
These convergence problems are similar to those mentioned in Sect. 2.4 with re- 
spect to the calculation of the localization length from the resolvent according 
to (42). To circumvent this problem one has to reorthonormahze the left-hand 
side of (62) regularly during the procedure. To be specific, we orthogonalize each 
of the M columns of the M x 2M vector onto the previous columns, i.e., we per- 
form the standard Gram-Schmidt orthogonalization procedure. Then the first 
column converges towards the eigenvector corresponding to the largest eigen- 
value, the second column to the second largest, and so on. The normalization 
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of the eigenvectors yields asymptotically the respective eigenvalues. In practice, 
it is sulRcient to perform the reorthonormalization after every 10 steps of the 
iteration (62). The Mth normalization constants of all these Gram-Schmidt pro- 
cedures have to be multiplied to determine the overall normalization of the Mth 
eigenvector and thus the eigenvalue closest to unity. In practice, the logarithms 
are summed yielding the smallest Lyapunov exponent and thus the inverse local- 
ization length. This sum can also be interpreted as an average over the respective 
Lyapunov exponents of many short strips of length 10. Accordingly, the fluctu- 
ations of these data can be used to determine the statistical error of the result 
in a straightforward way by computing the variance. 

The FORTRAN program loc2dl.for on the enclosed diskette^ performs 
the recursion (62) for a quasi- ID strip. Due to the very simple structure of the 
transfer matrices, the matrix-matrix multiplication in (62) can be programmed 
in a very efficient way, avoiding all multiplications with 1 and all summations of 0. 
The recursion (62) is performed until the requested relative accuracy is achieved 
unless the preset maximum number of allowed recursion steps is reached earlier. 
The data for the localization length and its variance are written into the file 
lambda_N.dat and can be visualized with the gnuplot program lambda_N . gmi. 
A typical example is presented in Fig. 1, demonstrating the large fluctuations 
and the slow convergence of the transfer-matrix procedure. 

4.2 Dependence of the Localization Length on the Cross Section 

A second version of the program, loc2d2.for, is provided for calculating the 
converged values of the localization length for various input parameters. For 
each combination of energy and disorder the results are stored in a separate 
file lambda##. dat, where ## denotes a consecutive number. These data and 
their dependence on the width M of the quasi- ID strip can then be visualized 
by means of the gnuplot program lambda_M . gnu. In this way the dependence 
of the localization length on the various parameters can be investigated. The 
reader is encouraged to try different combinations, but should be aware that the 
localization length for small values of the disorder may become very large so that 
a reasonable accuracy can be achieved only for very long strips. For zero disorder 
the energy eigenvalues of the Schrodinger equation are limited to | | < 4, which 

follows from the 2D version of (17). Therefore reasonable values for the energy 
parameter are given by \E \ < 4 + IF/2. The width M of the strip should not 
be chosen too small to avoid strong finite-size effects. The available computer 
power will limit the possible values for the width. The arrays in the program 
allow values up to M = 64. Of course, this limit as well as the preset accuracy 
of 1% can easily be changed. We have estimated that the necessary computation 
time increases proportional to the inverse square of the relative accuracy and 
the fifth power of the strip width in 2D systems. 

^ For copyright reasons we have not included a random-number generator with the 
program. Any generator will be sufficient, e. g. the very simple one described in 
Stauffer’s chapter of this book. 
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N 

Fig. 1. Dependence of the localization length Am of a quasi-lD strip on the length 
N of the system for energy E = 0 and disorder W = 5. The cross section is a linear 
chain of M = 10 sites with periodic boundary condition. The thick line reflects the 
localization length, the thin lines indicate the accuracy of the data determined from 
the variance as discussed in the text. At the end of this calculation (N = 20000) the 
achieved accuracy was 2.6% 



It i’s also instructive to vary the number of recursion steps which are per- 
formed between subsequent reorthonormalizations. If it is chosen smaller, the 
computer time increases. If it is chosen larger, wrong results are calculated. This 
can be easily verified by comparing various runs of the program loc2dl .for. 

As discussed in the previous section no genuine metallic behavior can exist 
in 2D systems if interaction effects or magnetic fields are ignored. In 3D samples 
metallic behavior is expected for small energy and small disorder, while insu- 
lating behavior is expected for large energy and/or large disorder. Of course, it 
is most interesting to investigate the metal-insulator transition which separates 
these two regimes. For this purpose, it is straightforward to generalize (62) to 
3D systems. Now the cross section will be a slice consisting of a square lattice 
of M X M sites. There are linearly independent possibilities for distributing 
the amplitudes on the first slice so that becomes a x matrix. The 
respective Hamiltonian matrix for the cross section is the 2D analogon of 
(63). It is explicitly constructed in Schreiber’s chapter in this book. 

However, the calculation of a sufficiently large data set of localization lengths 
for different parameter values in 3D is too time consuming for simple exercises. 
Therefore we have included respective data sets on the diskette for 24 different 
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Fig. 2. Localization length Am of anisotropic quasi-lD bars with cross section of M x M 
sites for different disorder W and energy E = 0 . The length of the bar was increased 
in each case until the accuracy of 1% was achieved for the data points. Connecting 
lines are for guiding the eye only 



disorders W (keeping the energy E = 0 constant) in the files raw##.dat. Each 
data set contains the localization lengths for 8 different values of M. The data 
can be plotted with the gnuplot program raw . gnu in the same way as the 2D 
data of the above described exercise. 

In order not to duplicate previously published data we have determined these 
data for a slightly modified system, choosing diflFerent transition matrix elements 
between nearest neighbors along and across the quasi- ID bar. Specifically we 
have used t = 0.4 along the bar and t = 1 within the slices. The data are 
presented in Fig. 2. 

These raw data can already be evaluated for the determination of the metal- 
insulator transition as indicated at the end of the previous chapter with regard 
to the renormalized Lyapunov exponents 71. For large disorder, 7^^ = ^m/M 
increases with increasing M. This means that Am grows faster than the exten- 
sion of the system. Consequently, the localization length will become infinite in 
an infinitely wide system. This corresponds to metallic behavior. On the other 
hand, for large disorder Xm/M decreases with increasing M, which means that 
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eventually for large M the electronic states will completely fit into the bar. They 
will be localized, this is a signature of insulating behavior. The metal-insulator 
transition occurs at the fixpoint, i. e., for that value of the disorder for which 
Xm/M remains constant. Due to the fluctuations of the data in Fig. 2 it is diffi- 
cult to determine this critical disorder. But one can already estimate W* « 12. 



4.3 Finite-Size Scaling Numerically 

The raw data presented in Fig. 2 can be fruitfully exploited by making use of the 
finite-size-scaling ideas presented in the previous section. Now we assume that 
Am /M = yl is a suitable scaling variable, which can be expressed as a function 
/ of the system size M and some scaling parameter which does not depend 
on M but only on the disorder W (and in general also on the energy E, which 
is kept constant in our example). This one-parameter scaling law 

A = /($/M) (64) 

corresponds to the scaling laws (51) and (52) in the previous section, in which 
the conductance itself was taken as the relevant scaling variable. 

We point out that (64) is an ansatz and whether or not the raw data in 
Fig. 2 fulfill this scaling relation has to be verified. This can be performed in 
a quantitative way by attempting a mean least squares fit of the data onto a 
common curve /. This requires a suitable adjustment of the scale of M (or 
equivalently of 1/M) by means of the scaling parameter ^(W) for each disorder 
W. We have performed such a mean least squares fit successfully, the results 
for ^{W) are supplied in the file xi.dat. The program scaling. for performs 
nothing but a multiplication of the sets of raw data with the corresponding 
scaling parameter. The scaled data can then be plotted using scacurve.gnu. 
The result is presented in Fig. 3. This figure demonstrates that the attempt to 
find a common functional relationship f{^/M) has been successful within the 
accuracy of our raw data. Thus the assumption of one-parameter scaling has 
been numerically corroborated. 

We note that there is an ambiguity in the scaling curve in Fig. 3, because the 
entire curve can be shifted by multiplying all scaling parameters ^ with a common 
factor. But for small A the cross-sectional diameter of the investigated bar is 
already much larger than the localization length Am • It is therefore a reasonable 
approximation that in this case Am is already close to the 3D localization length. 
This means that f{x) = x. Applying this relation to the data set for the largest 
disorder in Fig. 2 resolves the mentioned ambiguity. 

The most prominent feature of the scaling curve in Fig. 3 is the existence 
of two branches. These correspond of course to the two qualitatively different 
behaviors of the localization length in Fig. 2 increasing faster or slower than the 
lateral system size M, as discussed in the previous subsection. Accordingly, the 
upper branch of the scaling curve in Fig. 3 corresponds to the metallic regime, 
the lower branch to the insulating regime. The branches touch at the metal- 
insulator transition at which the scaling parameter ^ should diverge. Of course 
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In ^/M 



Fig. 3. Scaling function for the anisotropic 3D system. All raw data of Fig. 2 are scaled 
onto a common curve / by changing the scale of M~^ via fitting parameters ^{W) 



such a divergence cannot be expected in the numerical analysis due to rounding 
errors. A close inspection of this regime in Fig. 3 shows that several data sets 
overlap in this area. This is not surprising, if one takes into account that the 
raw data have been determined with a statistical accuracy of 1%. By the mean 
least squares fit procedure we minimize the vertical deviations of the data from 
the scaling curve in Fig. 3. However, a large horizontal shift of a data set in this 
regime produces only a small difference in the vertical agreement of the data set 
with the scaling curve. Therefore the fit cannot be accurate in this regime. 

This is a common problem in the numerical analysis of phase transitions. 
The divergence of any curve is rounded. The usual impression is that it should 
be easily possible to improve the fit by hand. The reader is encouraged to do this 
and to perform the entire scaling procedure by hand. For this purpose we have 
supplied a second file of scaling parameters xi . own in which all values are set 
to unity in the beginning. Running the scaling program and plotting the scaling 
curve with these parameters reproduces Fig. 2, but in a doubly logarithmic plot. 
The reason for the logarithmic A axis is only a better resolution. But the reason 
for the logarithmic M axis is more important for the following exercise. Now 
changing the scale by a factor ^ means adding a constant log ^ to log M so that 
the change of scale in the plot only means a horizontal shift of the data set for 
each disorder W. The reader is encouraged to perform this shift consecutively for 
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W 



Fig. 4. Scaling parameters ^ which have been used for the construction of the scaling 
curve in Fig. 3 (full line). Data from a graphic construction of the scaling curve ‘‘by 
hand” are also included {broken line) 



each disorder, i.e., changing the data in xi . own, scaling the raw data accordingly 
with scaling. for and plotting the resulting scaling curve with scacurve .gnu. 
Experience shows that this graphical construction of a master curve is often more 
successful than the mean least squares fit which provided the data in xi.dat. 
An example, is presented in Fig. 4 in which the scaling parameters from the mean 
least squares procedure are compared with results of such a scaling “by hand” . 
A divergence of the characteristic length ^(W') at the critical disorder W* « 12 
can be clearly seen. However, the divergence is significally rounded due to the 
numerical procedure. It can also be seen from Fig. 4 that the data which have 
been obtained “by hand” yield a much smoother curve. It must be pointed out 
that the graphical construction of the respective master curve corresponding 
to Fig. 3 has been performed as explained above, i.e., trying to improve the 
smoothness of the scaling curve but not looking at the smoothness of the 
curve in Fig. 4. 

We note that the deviation of the scaling parameters in the metallic regime 
in Fig. 4 is not significant. This deviation is due to the fact that in principle both 
branches of the scaling curve in Fig. 3 can be derived independently. Therefore 
it is necessary to determine an asymptotic behavior of the scaling curve in the 
metallic regime for small disorder. This can be performed associating ^ with the 
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resistivity of the 3D system [7] yielding f{x) = in agreement with (51). We 
did not invoke this asymptotic relation for the construction of the scaling curve 
because we have not included raw data for a sufficiently small disorder. 

Finally we note that the critical exponent v can be derived according to 
(59) from the data in Fig. 4. A nonlinear fit to the data xi . own yielded a value 
1 / « 1.4 in agreement with previous investigations [1, 14] of isotropic 3D systems. 
Of course, the value cannot be very accurate because we have not used as many 
raw data as in those investigations. Nevertheless it is another corroboration of 
the universality of one-parameter scaling that the critical exponents of isotropic 
and anisotropic systems agree. 

For completeness we note that in practice another determination of the crit- 
ical exponent has proven more accurate. It follows from (60) that close to the 
critical disorder the system-size dependence of the raw data is determined by 
that contribution to the sum in (60), which is governed by the critical exponent. 
This allows a straightforward determination of the critical exponent v from the 
raw data without any necessity of constructing a scaling curve. 



5 Present Status of the Results 
from Transfer-Matrix Calculations 

The status of the numerical work concerning the disorder-induced metal-insulator 
transition can be briefly summarized. 

Numerous and extensive calculations were performed for the (real) Anderson 
Hamiltonian on a simple cubic lattice with different distributions of the lattice 
site energies. Results have been comprised in [Ij. Basically, for energies inside 
the unperturbed band, \E\ <6, the above one-parameter scaling hypothesis was 
confirmed. The critical exponents of the conductivity and of the localization 
length were found to be equal within the achieved accuracy. Their numerical 
values were . determined as z/ = 1.45 ± 0.15. Previously obtained values, which 
differed for different distributions [15], were too strongly influenced by the above- 
discussed numerical rounding of the divergence of the scaling parameter at the 
phase transition. An intricate method of evaluating the raw data based on a 
quantitative criterion for the exclusion of inaccurate data points close to the 
transition [14] yielded z/ « 1.35 for the box, the Gaussian, and the binary distri- 
bution, i. e., independent of the distribution of the site energies, again within the 
accuracy. Thus, universality can be considered to have been explicitly demon- 
strated for this model, which belongs to the so-called orthogonal universality 
class of time-reversal invariant systems. 

In 2D the numerical data obtained so far for time-reversal invariant systems 
include square, triangle, and honeycomb lattices [16]. Irrespective of the number 
of nearest neighbors all the results are fully consistent with the expectation that 
V diverges with an essential singularity for vanishing disorder. All of the quantum 
states are therefore localized in 2D when the disorder is finite. 
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The divergence of the critical exponent in 2D systems was recently corrobo- 
rated by applying the transfer-matrix method to bifractal systems [17]. The cross 
section of the investigated quasi-lD bifractals is a fractal, e. g., the Sierpinski gas- 
ket described in this book in the chapter by Bunde et al. or a cluster constructed 
by diffusion-limited aggregation. These fractals are stacked regularly along the 
direction in which the recursion (62) is performed. Therefore a straightforward 
application of the transfer-matrix method is feasible for these bifractals, which 
are characterized by dimensions 2 < d < 3. Thus the dimensionality dependence 
of the critical disorder W* and the critical exponent i/ could be obtained. For 
d 2 the vanishing W* and the diverging u corroborated that d 2 is the lower 
critical dimension of the Anderson model. Moreover the behavior of u was shown 
to agree with a prediction obtained from the e expansion within the non-linear 
a model [18]. 

Some numerical work has been performed which concerned the “unitary uni- 
versality class” of systems that are not time-reversal invariant. They describe 
a charged particle in a magnetic field in the presence of disorder. Although 
from general considerations one expects that the critical behavior of the Ander- 
son transition should change upon a change in the symmetry class of the system 
(from orthogonal to unitary), the numercially obtained data indicate that within 
the accuracy the critical exponent is the same, namely i' = 1.35 ±0.15, indepen- 
dent of the strength of the (homogeneous) magnetic field [19]. This was obtained 
for different models. It means that universality is seen in the numerical results 
also in this case. The surprising conclusion is that the change in the symmetry 
causes only minor changes in the critical behavior at the Anderson transition. 

On the other hand, when the potential disorder is removed, and instead the 
magnetic field is randomized in order to produce a localization-delocalization 
transition, the critical behavior changes. First of all, due to the absence of the 
disorder, the position of the transition is situated near the band edge, in an en- 
ergy region where the density of states changes rapidly [20]. Second, the critical 
exponent is found to he i/ = 1.05 ± 0.15. This is surprising because the system 
still belongs to the unitary universality class. When in addition to the random 
magnetic field the random potential is again introduced, the critical exponent 
turns out to be the same as the one obtained in the case of the homogeneous 
magnetic field. It seems that the infiuence of the scalar random potential on the 
critical properties is much stronger than the changes in the symmetry. Within 
the conventional wisdom of phase transitions this is at the least somewhat unex- 
pected. Considerably more work has to be done in the future, in order to clarify 
this issue. 

In 2D a homogeneous magnetic field has dramatic effects on the localiza- 
tion properties. First of all, the electronic spectrum splits into Landau bands. 
They are well separated as long as the disorder is not too large. In this limit 
careful numerical work was carried out, showing that for each Landau band the 
one-parameter scaling hypothesis is valid. The localization length was found to 
diverge at the centres of the Landau bands with an exponent i/ = 2.34±0.04 [21], 
This result was obtained by completely different methods for a variety of differ- 
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ent potential models, including white-noise Gaussian as well as very long-range 
correlated randomness. It constitutes one of the most striking and precise demon- 
strations of universality of the disorder-induced locahzation-delocalization tran- 
sition. 

For a random magnetic field in 2D the present conclusions from the numerical 
and analytical work are controversial. They range from the statement that all of 
the states are again localized [22, 23] to the suggestion of a Kosterlitz-Thouless 
transition [24]. Again much more, and particularly more precise, work has to be 
done for this problem. 

Spin-orbit scattering induces another change of symmetry. As a consequence, 
one expects changes in the critical behavior. Indeed, in 2D, instead of only lo- 
calized states, a localization-delocalization transition occurs. It has been inves- 
tigated numerically to some extent [25, 26, 27]. However, due to the enormous 
requirements of computer power, the reliability of the results is not comparable 
with that of the above-described cases. Also for this problem considerable numer- 
ical efforts are necessary in order to answer satisfactorily questions confirming 
the “validity of one-parameter scaling” or the “universality” . 
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Abstract. Quantum Monte Carlo (QMC) simulations are a powerful means to obtain 
information about quantum mechanical systems with strong interactions. The chapter 
shows how the quantum Monte Carlo method is applied for strongly correlated systems 
at finite temperature, which can be described by the Hubbard model. The partition 
function is decomposed using the Trotter-Suzuki transformation (TS), the interaction 
is decoupled using the Hubbard-Stratonovich transformation. The problems that arise 
due to numerical instabilities and the negative sign problem, which is due to the Monte 
Carlo sampling technique, are briefly mentioned. 



1 Introduction 



1.1 The Hubbard Model 



It is not possible to describe the complex behavior of Hubbard-like Hamiltonians 
and their simulation in a single chapter without simplifications. Therefore the 
diagrams throughout this article show “classical particles” moving and interact- 
ing, disregarding the occurrence of collective phenomena which are important 
for the “real” physics of the problem. 

A general Hamiltonian for a system of N electrons which interact in real 
space via the Coulomb interaction in an external potential can be written 
as (spin indices are omitted) 






i=l 



EKin 



2 1 7 *^ 

'V—— 

Elnt 



i=l 



Epot 



( 1 ) 



where ri represents the coordinates of the electrons in real space. For tight- 
binding electrons in a solid, will not be continuous, only certain “lattice points” 
can be occupied and the Coulomb interaction will be screened. To describe in- 
teracting electrons in the tight binding approximation, we 



1. use the occupation number formalism to allow only discrete sites as particle 
positions; 



Software included on the accompanying diskette. 
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2. neglect the external potential, in order our system to become translation 
invariant; 

3. retain only the on-site interaction. 

We obtain the Hubbard Hamiltonian 

= -t'^4,sCj,s + UY^ni^ni^-(iY^{rii^+nii) , ( 2 ) 

ij,s i i 

y , > ,, / N / 

Eviin Chem. Pot 

where the operator creates a particle with spin s at site i and ^ annihilates 
a particle with spin s at site i. rii^s — number operator, whose 

eigenvalue is the occupation number of site i with a fermion of spin s. The 
summation index is over all lattice sites, the hopping term J^Kin occurs between 
nearest-neighbor sites i,j, and the Hubbard on-site interaction Eint is felt by 
electrons with opposite spins on the same site. The last term in (2) is analogous 
to —PV in classical physics and indicates the injection of particles into the 
system if /x > 0, whereas particles are drawn away from the system if /x < 0; the 
situation /x = 0.5 corresponds to a half-filled system (riif) = (n^) = 0.5 . Fig. 1 
shows the two cases with Hamiltonian (1) respectively Hamiltonian (2). Fig. 2 
shows the behavior of the system for three different electron densities. 



Coulomb interaction, continuous Fj 



^ ^ 



Hopping Hubbard interaction 




Fig. 1. Simplified behavior of the electrons for the Hamiltonian (1) shown left, and the 
Hamiltonian (2) shown right The diagrams axe one- dimensional simplifications of the 
two-dimensional situation 



In the following, the energy and the temperature will be given in units of t = 1 
(A:b= 1)- The hopping term is diagonal in momentum space, the interaction term 
is diagonal in real space. Therefore, perturbation expansions are valid only for 
certain regimes of doping. 

The Hubbard Hamiltonian is the generic model for tight-binding Hamiltoni- 
ans. In its most simple form it is the workhorse for strongly correlated electrons, 
in the same way as the Ising model is used for “spin” systems. However it can 
be generalized via: 

— Additional interactions; in the extended Hubbard Model, nearest-neighbor 
(nn) interactions V ^ riisTijs * , {i is nearest neighbor to j) are included. 

- Additional hopping terms, t — hopping; hopping between next-nearest 
(nnn) neighbor sites modifies the kinetic energy and the density of states. 
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— Additional bands; the 3-band Hubbard model (Emery model) is supposed 
to describe the electronic structure of the CuO planes in high temperature 
super conductors (different hopping terms between copper and oxygen sites, 
different interaction strength in copper and oxygen sites, different chemical 
potentials on the copper and oxygen sites). 



Nearly empty band: free electrons 




Halffilled Band: Antiferromagnet 

(f) 4) (t) 

H 1 1 1 1 1 — 

4^ 



Nearly full band: free holes 

4 ) 4 ) 4 ) 4 ) 

H 1 1 1 1 h- 

^ ^ ^ K...^ <p 



Fig. 2. For low electron densities, collisions between the electrons in the Hubbard model 
are improbable, so they will behave as free particles. For medium densities (near-half 
filling) and “strong” interaction, the Hamiltonian exhibits antiferromagnetism. For 
nearly full bands, the “hopping” of electrons leads to double-occupied sites, which is 
energetically costly and therefore unprobable, the system becomes insulating 



1.2 What to Compute 

Depending on what physical phenomena one is interested in, different observables 
can be computed using quantum Monte Carlo (QMC) simulations: 

— Correlation functions in real space indicate whether the net interaction is 
repulsive or attractive, and whether there is any ordering in the system. 

— Magnetic structure factors indicate the magnetic ordering of the system. 

— The distribution n(k) gives the “Fermi surface” and indicates how important 
many particle effects are (see Fig. 3). 

— The dependence of the filling on the chemical potential and the interaction 
strength can be computed directly using QMC. 

— Densities of states using Greens functions in imaginary time can also be 
computed, but the details of these computations would lead too far (see 
Sect. 4). 

When statistical Monte Carlo methods are used, only systems in thermal equi- 
librium can be simulated, so that currents and super-currents (where the re- 
sistance of the conductor vanishes) cannot be observed directly. Therefore, the 
breakdown of the resistivity and the Meissner effect cannot be verified in the 
QMC simulations. 



1.3 Quantum Simulations 

Many analytical and numerical methods have been used in connection with the 
Hubbard model (for didactic introduction see [2] Chapt. 20; for a more technical 
introduction of the analytic methods see [3]). 
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<n_k> for U=0 <n_k> for U=4 





Fig. 3. N{k) for a 8 X 8 system for t/ = 0 and [/ = 4 at /3 = 4 after [1]. Due to the 
depletion of the distribution for small |A;| at U = 4, the electrons cannot be treated as 
a Fermi liquid 



A general problem in the simulation of quantum systems is the treatment of 
the problem in such a way that instead of handling operators one deals with ma- 
trix elements, which are real numbers. Exact diagonalization is a “conceptually 
straightforward” approach (the numerical realization is less straightforward, see 
[4] and references therein) but works only for system sizes of about 4x4 lattice 
points. A new concept is stochastic diagonalization, whereby states are sampled 
by a Monte-Carlo procedure [5]. 

To set up a quantum Monte Carlo simulation, we need 

1. a procedure to deal with the noncommuting parts of the Hamiltonian (Trot- 
ter-Suzuki decomposition); 

2. a procedure to decouple the interaction, so that we obtain effective single- 
particle states for a fixed configuration (Hubbard-Stratonovich transforma- 
tion) ; 

3. a Boltzmann weight, which can be obtained form the partition function; 

4. a method to evaluate the matrix elements for the computation of observables 
(real numbers on the computer). For our algorithm, the most useful of these 
matrix elements are the single-particle Greens functions. 

The Boltzmann weight for the algorithm that is introduced in the following 
chapter is computed by a determinant of matrices, so that the algorithm is 
called Determinantal quantum Monte Carlo algorithm (DQMC). 

2 Grand Canonical Quantum Monte Carlo 

In this section, the grand canonical quantum Monte Carlo (GQMC) algorithm 
will be explained. This algorithm allows the simulation of the Hubbard model 
at finite temperatures in the grand canonical ensemble. 

2.1 The Trotter— Suzuki Transformation 

The Hubbard Hamiltonian consists of a kinetic term and an interaction term 
(with chemical potential). The computation of the partition function at finite 
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temperature, 



Z = Tvfexp(-/3if)^ , 



involves the computation of an operator exponential. 

In quantum physics one has frequently to deal with operators of the shape 
exp(—^(A + jB)), where A and B are hermitian parts of a Hamiltonian, (3 is real 
or complex. Typical examples are the partition function or Greens functions. 
Usually it is easy to find a basis where A ot B are diagonal, but if A and B do 
not commute there will be no tractable diagonal basis for A + B. 

The Trotter-Suzuki (TS) transformation, also called the generalized Trotter 
formula or Lie-Trotter decomposition, is a method for decomposing the above 
exponential operators. It was introduced into computer simulations by M. Suzuki 
[ 6 ]; 

= lim (3) 

n-^oo V / 

for arbitrary, bounded operators A, B. 

For the numerical implementation the limit (n — > oo) in (3) must be approx- 
imated by a finite discretization. The exponent with the prefactor (3 is decom- 
posed into a product with small dr, the factors are called slices: 

e-^(A+B) ^ dr = Pin . (4) 

The error can be computed by inserting a parameter A into the exponent. After- 
wards the expression is replaced by the integral over the derivative with respect 
to A, and the norm is estimated. One can show that [7] 

^At{a+b) ^ + 0{dr^) , ( 5 ) 

with Landau’s symbol for the order of the magnitude O. This approximation is 
called “first order” because it is correct up to order oc dr. A decomposition of 
second order (correct up to order oc dr^) is 

^driA+B) ^ ^drAl2^dTB^drAl2 . ( 0 ) 



In contrast to the decomposition of first order, this decomposition yields real 
symmetric exponentials for real symmetric A,B. 

It is a general feature of numerical approximations (also for integrations, 
differentiations, solutions of partial differential equations) that the symmetric 
formulae are of higher accuracy than the asymmetric ones. 

If one wants to increase the accuracy for a decomposition of a certain order, 
numerical costs increase. Nevertheless, for long products the necessary work for a 
decomposition of first order is nearly the same as for a decomposition for second 
order: 

first order; 
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second order; 

^ndr{A+B) ^ ^dTA/2^drB^drA/2 ^drA/2^drB ^drA/2 ^dTA/2^drB ^drA/2 

^ ' V V V ' 

first slice second slice nth slice 



_ ^dTA/2^dTB^dTA^dTB^drA ^ ^ ^ ^drB^dTA/2 ^ 

To evaluate the first-order decomposition in (7) one needs 2n — 1 products 
and for the second-order decomposition in (8) one needs 2n products because 
gdr^/ 2 gdrA /2 _ gdrA ^oi the physically interesting case of large n (low temper- 
atures) this is practically the same for first as for second order. 

Several verbalizations are commonly used to describe the occurrence of the 
product. The number of the slices is considered as the “coordinate” in an addi- 
tional dimension. Sometimes, the TS formula is called ^^equivalence theorem'^ “an 
m-dimensional quantum system mapped onto an (m + l)-dimensional classical 
system.” The system is classical inasmuch as there are only commuting opera- 
tors ^ (for large n) left. Nevertheless, the term “classical system” does not 
mean that the system can be mapped onto something like a harmonic oscillator 
or another classical “workhorse” of theoretical physics in high dimensions. The 
coordinate for the (m + 1) “slices” is also called Trotter time, which is imaginary 
time for many quantum simulations. 

Another verbalization is [6] , “The additional dimension plays the role of path 
integrals in a discrete space . . . ” , so that formally, all QMC algorithms can be 
viewed as numerical path integral methods. Nevertheless, the dynamics and the 
implementation of different algorithms varies considerably. 

A decomposition of fourth order can be found in [7]; one decomposition up 
to fourth order that does not rely on the troublesome computation of a double 
commutator is the so-called “fractal” decomposition [8]. Very often, the Baker- 
Haussdorff formula (see [9]) 

qA^B ^ + 

is used in high-energy physics in the same sense as the Trotter-Suzuki decom- 
position is used in solid-state physics. 

2.2 The Hubbard— St ratonovich Transformation 

In DQMC, the Hubbard-Stratonovich transformation (HS) is used to decouple 
the interaction and thereby to transform the many-particle system with interac- 
tion into a problem without interaction. The transformation is also called auxil- 
iary field transformation, because auxiliary potentials are built up on the lattice 
in such a way that the fluctuating potential is able to model the interaction U 
for the many-particle problem. 

Instead of simulating interacting electrons, one simulates electrons with spin 
up/down in a fluctuating potential. The sign of the potential is determined by the 
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Single lattice with 



interacting fermions 




Hubbard-Stratonovich Transformation: 




Potentials for 
I -particles 



+1 



-1 -1 +1 HS spin 




Potentials for 
I -particles 



Fig. 4. Decoupling of the interaction U using the Hubbard-Stratonovich transforma- 
tion: Instead of simulating one grid with interacting spin-up and spin-down particles, 
one simulates two grids where the particles with spin-up/spin-down move in a fluctu- 
ating potential. The prefactor of the potentials of size A depends on the spin of the 
electrons (up, down) and the Hubbard-Stratonovich spins cJt = ±1 on site i 



sign of the Hubbard-Stratonovich spin a (Fig. 4). One has to sum over several 
configurations of this potential to obtain the effect of the interaction, 

^gdryt(<r)ntgdrn(^)n4 instead of QArUu\ni _ 

<7 



The Hubbard-Stratonovich transformation (HS) for a single site can be de- 
rived in the following way. We will show, that 

^—dTUri'^ni ^ E ^Acr(nt-n4.)^-drC//2(nt-|-n4,) ^ 

where A is a free parameter. Comparing both expressions in Table 1 for all 
different values for the electron densities E {1,0}, one obtains ^ 

cosh(A) = . (10) 

For more sites, a site index i has to be introduced, the exponential of an 
operator becomes the exponential of a diagonal matrix of operators. 

In [1] and in the remaining chapter, the notation Tr^- is used instead of | . 

To avoid the occurrence of a complex A, diflferent HS transformations are neces- 
sary for positive and negative U. For negative [/, (10) has no real solution. 

The advantage of this decoupling strategy is: 



^ In [1], this is written as A = 2atanh tanh(dr 1/ / 4) , “atanh” is misprinted as “atan”. 
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Table 1. Evaluation of (9) 





^—dTl/nfU^ 


1 ^Xa(nf-nj_)^-dTU/2(n^+nj^) 

<T=±1 


nt = l,ri4. = 1 


e-drt/ 


e-dru 


nt = 0, n; = 1 


1 


cosh(A)e-‘*^'^/2 


Uf = l,ni = 0 


1 




Uf = 0,ni = 0 


1 


1 



- The interacting system is reduced to a noninteracting system with potentials 
for a fixed configuration of cr, so that the Wick theorem can be applied. This 
can be seen by separating the right-hand side expression in (9) for spin-up 
and spin-down particles: 

2. ^ drC//2(n-f4-n4,) _ 2. ^ ^ ^{^(^—^TU/2)n^^{\(T—dTU/2)ni 

^ CT=±1 ^ (T=±l 

— It is not necessary to sum over all a configurations, but it is sufficient to 
sum over the most important ones using Monte Carlo. In the literature, 
the summation is often called “integrating out the interaction degrees of 
freedom” . 

The Hubbard-Stratonovich spins are very often termed “Ising spins” or “Ising 
fields”. The terminology is slightly misleading, because the dynamics of the 
DQMC algorithm is in no way related to dynamics of an Ising simulation. Al- 
though the configurations of the Ising-spins are d=l, the resulting fluctuating 
potentials are elements of a diagonal matrix which enter in a matrix product, 
they do not interact in the same way as a spin model. 

In the special case of the interaction on discrete lattice points, the HS trans- 
formation can be written as a sum. For continuous fields, there is an integral 
instead of a sum. In the case of the grand canonical ensemble, the chemical 
potential is added in the exponent to control the filling. The HS transformation 
may only be applied after the TS transformation, so that every slice has its own 
spin configuration. The fluctuating potentials correspond to a field which helps 
to model the interaction, so that the Hubbard-Stratonovich transformation in 
the QMC algorithms is also called the auxiliary field method. 



2.3 The Partition Function 

The Boltzmann weight for a fixed configuration of HS spins can be derived from 
the partition function. In the derivations, the identity matrix will be denoted as 
1. The hopping matrix will be denoted as K. 
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First, the exponential of the Hamiltonian is rewritten as a product of slices 
(11), which is just the definition of the exponential. Then the slices are decom- 
posed using the Trotter-Suzuki decomposition (12). In (14) the interaction is 
decoupled using an as the Hubbard-Stratonovich spin on the Ith slice on site f, 
and in (14) the decomposition is separated for spin-up and spin-down particles: 



Z = Tr (e“^^) = Tr 
= Tro- ^Tr 

_ (+'^<^i,j-dr(//-C//2)) 



With the notation for the “slices” Bj,Bf 



bJ = , y.J (1) = Sij {+Xai,i - drifi - U/2 )) , 

Bi = ^ ^ ^ _ t;/2)) , 

one can define the operators 

D] = ^ Df = ^drctK^jc. gC+V,h(0c, ^ 

so that the partition function can be written as 



2=T^.'»(no?) (no/). 



(11) 

( 12 ) 

(13) 

(14) 

(15) 

(16) 

(17) 

(18) 



The bilinear forms in fermion operators cf . . . Cj can be formally “diagonalized” 
so that the trace over the fermions can be taken explicitly; this takes up the 
whole appendix in [ 1 ]. 

Finally, one can prove that one can compute the partition function by com- 
puting the determinant of the matrices Bi instead of computing the trace of 
the operator exponentials Di. In equation (19), (det det 04 ^, 0 -) is introduced 
as a convenient abbreviation for the “statistic weight” of a cr configuration of 
Hubbard-Stratonovich spins. 

Z = Tt,Tt 
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= Tr, det(l + 4) j det(l + . . . Bf) 

= Tra(detOf^a)(detO^,a-) • 



(19) 



2.4 The Monte Carlo Weight 

We will not give a justification of the Monte Carlo method (for that purpose, 
see [10]), but in Table 2 we show the analogies between classical and quantum 
Monte Carlo methods in the derivation of the statistical weight by analogy. The 
heat-bath algorithm is usually used for DQMC simulations (Phb in Table 2). 

The product (det Of (a) det Of (a)) may be negative, therefore the Boltzmann 
weight for a HS configuration a is defined using the absolute value as 

P[a) = \ det Of det Of | . 

Therefore, in order to compute an observable {A) one has to compute separately 
{A'^) and {A~) for positive and negative statistical weights, and with the per- 
centage of positive and negative statistical weights (Sign”^), (Sign“), one can 
compute an observable as 



(Sign+) - (Sign ) 

Computing observables without the use of this “minus sign” would lead to non- 
physical observables (see Fig. 5). 



Table 2. Analogies between classical and Quantum Monte Carlo methods in the deriva- 
tion of the statistical weight 



Classical MC Quantum MC 



States to sample 



Classical state i 



a (HS spins) 



Partition function Z 









Tro-(det Ot,a)(det Of, a) 



Boltzmann weight 



e for i (det Ot,a)(det O 4 ,,o-) for a 



Subsequent states in MC 



, / 
(j — > cr 



Relative probabilities 
Heat bath algorithm 



P{i-^i>) 






Phb 






_ (detOt,o-0(detO4,^/) 



Phb 



P(a-ycr') 

l+P(<T^cr') 
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Histogram for n for U=8 beta=3 with minus sign 





Fig. 5. Distribution of the rii for C/ = 8, ^ = 3 with correct (first part of figure) and 
wrong (second part of figure) use of the sign. Nonphysical “tails” in the distribution of 
the electron density occur outside [0, 1] 



3 Equal-Time Greens Functions 

In the following derivations the index for the electron spin will be omitted if the 
quantity is evaluated for both spin directions in the same sense. Thermodynamic 
averages, which are usually denoted by (...) in textbooks are physical quantities. 
Monte Carlo measurements are denoted by ^ and they are not physical 
quantities, only their average is.^ 

The equal-time Greens function c^) in the grand canonical algorithm can 
be derived as 




1 

+ BjnB m—1 




= G{m)ij 



( 20 ) 



^ In [1], this is expressed using the notation of (. . .) for nonphysical “single-bracket 
averages” and physical “double- bracket averages” ((...)). 
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where G{m) is an abbreviation for the matrix with the entries c^) for all i 
and j. The above operations can also be visualized by a block diagram; \ stands 
for the diagonal matrix of the HS fields, Q for the full square matrix of the 
exponential of the hopping matrix. The matrix of the Greens function is given 
by 

For the computation of the electron densities and spatial correlation functions, 

^ • • • "^1 ^ ~ ~~ 

is computed. The Greens functions will be referred to as matrix G{m) for con- 
venience. From the c^) or {c^c- the equal-time Greens functions in mo- 
mentum space can be obtained via the Fourier transform. For isotropic systems, 
the Greens functions in real space for a fixed HS configuration are averaged over 
corresponding lattice sites before they are Fourier transformed to save storage. 
G will be used to denote the the Greens functions on a general slice. 

3.1 Single Spin Updates 

The straightforward computation of G takes about 2mL^ fiops. If this product 
were recomputed for each Monte Carlo move with a new random HS configura- 
tion, a high percentage would be rejected and the algorithm would become very 
costly. Due to the diagonal nature of the interaction term, it is possible to de- 
cide whether a single HS spin has to be flipped and how this changes the Greens 
function without recomputing the whole expression for G. This takes oc flops 
[ 11 ], 



3.2 Numerical Instabilities 

There is a product in the denominator of 

G{m) = (1 + BmBm-i . . . Bi)~^ 

which is a product of exponentials of matrices. For U = 0 and a system size of 
4x4 sites, the inverse of the Greens function for several (3 can be seen in Fig. 6. 

For /3 = 4, G is computed as the inverse of a diagonal dominant matrix, 
whereas for increasing the entries of the inverse of G increase exponentially. 
For increasing C/, the entries increase and the structure of the matrices gets lost. 

The numerically accurate computation of the inverse of (1 -f Bm • . • jBi) is 
difficult for a number of reasons. For [/ = 0, the product of the Bm . . . is just 
the exponential of the hopping matrix, with singular values distributed between 
exp(4/3t) and exp(— 4/3t), t is the hopping element. The addition of the “1” 
constitutes a cutoff for all singular values smaller than 1, each singular value 
of size e is replaced by a singular value of size (e + 1). This means that for the 
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1/G, beta=.25 



1/G, beta=1. 




Fig. 6. Inverse of the Greens function for a 4 x 4 C/ = 0 system without interaction. 
The x,y coordinates are the rows and columns, the z coordinate is the matrix entry. 
Due to the viewpoint of the plotting routine, the upper left graph corresponds to a 
diagonal (dominant) matrix. The size of the matrix entries increases exponentially 
with increasing /3 



computation of the inverse of (1 H- Bm . . . Bi), the addition of the “1” cannot be 
considered as a small perturbation to Bm . . .B\. The inverse of Bm ■ -Bi could 
be easily computed as (^rn ■ • • ...Bf^^ but its largest singular 

values are not present in the inverse of (1 + Bm • ■ • B\)~^ . This means that 
numerical perturbation theory or related methods cannot be employed for this 
problem. 



4 History and Further Reading 

The Monte Carlo method was introduced by Metropolis et al. forty years ago 
[12]. M. Suzuki introduced the Trotter decomposition and performed the first 
“real” quantum Monte Carlo simulation on a quantum spin system in 1977 [6]. 

The worldline algorithm for Fermion simulations was presented by Hirsch et 
al. [13]. It looks conceptually easier than the determinantal method presented 
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in this chapter, but has severe drawbacks if it is used for “realistic” simula- 
tions. Most simulations for the Hubbard model in the field of high-temperature 
superconductivity use the determinantal algorithms. 

The grand canonical algorithm was introduced by R. Blankenbecler, D. J. 
Scalapino and R. L. Sugar in [14]. Intended for the use in field theory, the paper 
uses Lagrangian formalism and Grassman variables. The paper is hard to read 
for solid-state people, but a standard reference. The algorithm was adapted for 
the Hubbard model by Hirsch in 1985 in [1] and this paper is still the “bible” for 
QMC simulations for the Hubbard model at finite temperatures. Earlier, Hirsch 
presented the discrete form for the Hubbard-Stratonovich transformation, which 
speeds up the simulation and makes the numerics simpler [15]. 

The treatment of the numerical instabilities is presented in [16], the treatment 
of the minus sign in [17]. The most complete introduction of the zero-temperature 
(projector) algorithm (canonical ensemble) can be found in [18]. The projection 
operator filters the ground state from a test function |T). This can be seen 
from the energy representation of the problem, in which all higher states [n), 
n > 0 are exponentially suppressed: 

e-^^\T) = e-^^'£{n\T)-\n) 

n 



= j (0|T) • |0) + XI • |n) J . 

\ n>0 / 

A nice review, which presents the concepts of zero- and finite-temperature algo- 
rithms, can be found in [11]. The formulae for the computation of time-dependent 
Greens functions for finite temperatures are given in [1], while for the ground 
state-algorithm [19] may be refered to. Methods to extract dynamical informa- 
tion (spectral functions, densities of states) from time-dependent Greens func- 
tions can be found in [19] and [20]. 

Appendix A: Statistical Monte Carlo Methods 

The program Montecarlotest .m computes the probability distribution for a 
certain phase space using statistical Monte Carlo methods. It is a didactical 
attempt to show “why Monte Carlo works” . A phase space (coordinate i) is set 
up: every state (point in the phase space) is assigned a certain “eigenenergy” 
Ei. At temperature T, the probability of the system occupying state i is given 
by the Boltzmann factor 

The phase space may be too large (e.g., continuous) to sample all possible states. 
Instead, one moves randomly from a coordinate i to the coordinate with the 
relative probability 



P„,(i -* 0 = 
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If Eit < Ei^ one moves in any case. In the following program, the “move” in 
phase space is realized by randomly (using the random number generator rand) 
choosing a new coordinate itest from the vector x. The corresponding energies 
are stored in the vector energy. 

The “general” area of the walk is stored in the vector xbin. It can be seen that 
the entries in xbin (the probabilities of visiting a certain area in the phase space) 
correspond to the Boltzmann probability. There, the statistical Monte Carlo 
method is able to sample a phase space without using all possible configurations. 
It can also be seen that through statistical fluctuations the accuracy of the 
method is limited. To increase the accuracy, more sweeps have to be computed. 
The probabilities are not normalized. The set of commands which are commented 
with 7o, allow the creation of Postscript-files. 

Apart from generating all “physical” configurations using an “infinitely” long 
Monte Carlo run, one has to take several things into account when using “finite” 
computer time. 

1. Thermalization. For a “realistic” simulation, before measurements can be 
made the thermodynamic system has to be in equilibrium. Imagine a ferro- 
magnetic spin system above the critical temperature: if you set up an initial 
configuration with ordered spins, you have to allow the computer some time 
to destroy this ordering to obtain a “physical” configuration. 

2. Independent configurations. If you move around in phase space, the mea- 
surements have to be taken at points “far enough apart” so that the mea- 
surements are not correlated (i.e., they are “statistically independent”). In 
Fig. 7, failure to achieve this would correspond to just sitting in an energy 
“valley” , for example near coordinate 44 and generating configurations in the 
neighborhood. In that case, the Monte Carlo probability distribution would 
just be a spike near 44, the system would not visit the whole phase space. 

3. Detailed balance. Apart from the fact that the whole phase space must 
be sampled, it must be sampled in such a way that the simulation does 
not create more offers to sample a certain region. In the example program 
Mont e car lot est .m, the coordinate to be sampled is chosen randomly from 
the whole phase space. Choosing a “normal distribution” around some co- 
ordinate would violate detailed balance. 

4. Error bars. Monte Carlo simulations generate measurements with statistical 
error bars which decrease with the number of measurements Nm as 

This means that for useful results, sufficient Monte Carlo measurements have 
to be generated. 



Appendix B: OCTAVE 

Due to the large amount of linear algebra in the DQMC algorithms, the example 
programs are written in OCTAVE. OCTAVE is an interpreter for a matrix- 
oriented programming language. It is very fast on matrix operations, but due to 
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Energy E 




Probability P 




Coordinate in phasespace I 
(Non-normalized) Probability P with Monte Carlo 




Fig. 7. Energy in phase space JE7, thermodynamic probability P for an arbitrary phase 
space and relative probability of hitting this part of phase space using Monte Carlo 
approach (below) 



interpreting instead of compiling the code it is comparatively slow on inner DO 
and IF operations. 

In OCTAVE, the visualization of data at any stage of the computation is 
possible. This allows fast debugging of complicated numerical codes because 
errors can be traced down using the resident data in the memory even after a 
program crash. 

There is a vast (and increasing) number of “intrinsic functions” which can 
be applied to two-dimensional arrays, such as matrix decompositions, Fourier 
transforms etc., which makes the “language” also attractive for the processing 
of output data of “serious” computer simulations. 

The language uses the overloading of operators in such a way that matrix 
and vector products are automatically recognized from the type of variables 
in real and complex arithmetic. Using the wrong type of data, such as row- wise 
instead of column- wise vectors, leads to error messages. The language can handle 
complex numbers in such a way that is automatically assigned the value 
0 + i. 
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There are some slight drawbacks, due to the “simple” structure of the lan- 
guage: 

1. Only double precision is available, so that algorithms such as the Newton- 
Raphson algorithm for the iterative improvement of the solution of a set of 
equations cannot be computed in mixed precision. 

2. The language can only handle two-dimensional arrays, so that arrays of 
higher dimensions must be unrolled. 

3. Programming errors which result in the wrong kind of data structures, e.g., 
matrices instead of scalar values, are very often propagated through the 
program, because OCTAVE applies all following operations in the matrix 
sense. ^ 

OCTAVE is the public domain version of MATLAB, written by John W. Eaton 
at the Institute for Chemistry at the University of Texas. The source code is 
available via ftp at ftp . che . utexas . edu in the directory /pub/ octave. Binaries 
are also available for most UNIX systems from other public domain file servers. 

Appendix C: Exercises 

The programming exercises are supposed to give an introduction to determinant 
quantum Monte Carlo methods from the point of view of matrix computations. 

Exercise 1: Matrix exponentials 

Ex. 1.1: (Hopping and boundary conditions) Set up the hopping matrix “by 
hand” for nearest neighbor hopping in 

a) 1 dimension for a system of N lattice sites, 

b) 2 dimensions for a system of Nx x Ny lattice sites. 

c) Write a program which is able to perform these task. 

The results should be visualized using the OCTAVE online graphics. 

Ex. 1.2: (Linear Algebra first year) If you don’t remember it, look up the 
theorem: “The eigenvalues of a hermitian/real symmetric matrix are real” in 
a reasonably good introductory text for linear algebra. Set up hermitian and 
real symmetric matrices of reasonable size and test this theorem experimentally 
using the eig function of OCTAVE. 

Ex. 1.3: Using Ex. 1.2, think of a way to compute the exponential of a 
matrix and write a program in OCTAVE. Verify your results using the built- 
in matrix-exponential functions expm, expml, expm2, expmS. and read up the 
documentation on expm in the OCTAVE handbook (see Appendix B). If you 
have access to MATLAB, study the help file for these functions. 

Ex. 1.4: Write a program that compares the matrix exponential obtained 
by using the diagonalization with the matrix exponential obtained by using the 
Taylor series. 
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Exercise 2: The Trotter— Suzuki decomposition 

Ex. 2.1: Set up a real symmetric matrix A. Compute e“^^ for ^ = 10, 1, 
0.1, 0.01 . . . What does the matrix look like for decreasing /3? 

Ex. 2.2: Set up real symmetric matrices Compute for differ- 

ent /3 directly and by using the TS decompositions of first and second order for 
different discretizations dr = ^/n. Try to verify that more accurate decomposi- 
tions give more accurate approximations for , Compute the difference 

in the relative norm 

1 1 -^exact -^approx 1 1 

I l-^exact II 

using the built-in OCTAVE function norm: normrel=norm(Aexact-Aapprox)/ 
norm(Aexact). Why does the TS transformation “work”? What do 
^-(3fnB Jaj-ge Remember that diagonal matrices commute. 

Exercise 3: The Grand Canonical Quantum Monte Carlo 

Ex. 3.1: Compare the lecture and the example program ghd2main.m and 
make sure that you understand which formulae correspond to which program 
parts. 

Ex. 3.2: (Filling) Set the interaction u to zero and run the program for one 
sweep. Modify the chemical potential mu and see how the filling depends on the 
chemical potential. Remember that the electron densities can be obtained from 
the diagonal of the matrices grup, grdn. The filling is given by the densities on 
site z, {rii) = (cf ci) = (1 - Cicf ); (1 — Cicf) can be obtained as diagonal element 
of grup, grdn. 

Verify that you can use /i=mu to fill (/i oo) or empty (/i — oo) the system. 

Ex. 3.3: Rewrite the code so that you obtain the Greens functions in second 
order instead of first order. Two additional matrix multiplications are enough. 
This can be accomplished either in compute_Greensf .mor after the computation 
of the Greens function by wrapping: 

^ ^ BmBm-i . . . -Bi)-' 

G(m)2-0’'d ^ g-rK/2 ^ ■ ■ • Bi)-^ . 

Ex. 3.4: (Antiferromagnetism) What is the effect on the diagonal elements 
in the case of half filling if the interaction U is increased? What is the physical 
result? 

Ex. 3.5: (Numerical Instabilities) Impose a symmetry on the HS Spins and 
see whether you can find this symmetry in the Greens function. See how far you 
can increase U and (3 before the symmetry vanishes. 

Solutions to some of the exercises 

Ex. 1.1: See example program Hoppingmatrix.m 

Ex. 1.3: If matrix A is real symmetric, it can be decomposed into A = 
U-E-U^ with unitary U and real diagonal E. Therefore, exp(A) = U •exp{E)-U^ 
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where exp(£J) can be evaluated element-wise. (Proof: look at the Taylor series 
and remember that UU^ = 1 for unitary matrices.) 

Ex. 1.4: You can also use the intrinsic functions expm2 (Taylor series) and 
expmS (Eigenvalue decomposition) from OCTAVE, 

Ex. 2.1: Real symmetric matrices A: for random matrices, a real symmetric 
random matrix can be set up in the following way: 

beta= % Insert a proper value 

A=rand(20) % set up a 20x20 random matrix 

A=.5*(A+A’) % Matrix A plus its transpose 

mesh (exp (-beta*A) ) % make a mesh-plot of the matrix 

For decreasing beta, the exponential of — /3 * A should approach the identity 
matrix. 
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Abstract. The purpose of this lecture is to introduce the general concepts for building 
algorithms to solve the time-dependent Schrodinger equation and to discuss ways of 
turning these concepts into unconditionally stable, accurate, and efficient simulation 
algorithms. The approach is illustrated using results of a computer-simulation study of 
charged-particle interferometry, combining features of both the Aharonov-Bohm and 
Hanbury-Brown-Twiss experiment. 



1 Introduction 

Progress in nanoscale lithography has made it possible to perform “electron- 
optics” experiments in solid-state devices [1, 2]. In an ideal device the motion of 
the electrons is not affected by interactions with impurities, phonons etc., i.e., 
the electrons travel ballistically, just as they would do in ultra-high vacuum. In 
real devices, typical distances for ballistic motion can be as large as 250 Aj?, Xp 
being the Fermi wavelength of the electrons [3] . 

A similar, but otherwise unrelated, breakthrough is the development of atom- 
size field-electron-emission sources. Recent experiments using these atom-size 
tips [4, 5] have demonstrated that they act as unusual electron beam sources, 
emitting electrons at fairly low applied voltages (a few thousand volts or less) 
with a small angular spread (of a few degrees). These properties make such elec- 
tron sources very attractive for applications to electron microscopy, holography, 
and interferometry. 

From a physical point of view, both these nanoscale structures have at least 
one important common feature: the characteristic dimensions of these devices 
are comparable to the wavelength (typically the Fermi wavelength Af) of the 
relevant particles (typically electrons). Under this stringent condition, a classical, 
“billiard-ball” description of the particle motion is no longer valid. A calculation 
of the device properties requires a full quantum-mechanical treatment. 

The dynamic properties of a nonrelativistic quantum system is governed by 
the time-dependent Schrbdinger equation (TDSE) 

in^^\m) = n\m) ( 1 ) 

* Software included on the accompanying diskette. 
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where \^(t)) represents the state of the system described by the Hamiltonian H 
(here and in the following we use H to denote the differential operator and H 
for the hermitian matrix representing Ti). In analogy with ordinary differential 
equations, the formal solution of the matrix differential equation 

-^U{x) = HU(x) , U{0) =I , (2) 

where I denotes the M x M unit matrix and H is a, M x M matrix, is given by 

U(x) = (3) 

and is called the exponential of the matrix H. In quantum physics and quan- 
tum statistical mechanics, the exponential of the Hamiltonian is a fundamen- 
tal quantity. All methods for solving these problems compute, one way or an- 
other, (matrix elements of) the exponential of the matrix H. In the case of 
real-time quantum dynamics x = —it/h whereas for quantum statistical prob- 
lems X = — /3 = —l/ksT. 

Formally, the exponential of a matrix H can be defined in terms of the Taylor 
series 




just as if H were a number. For most problems of interest, there will not be 
enough memory to store the matrix H (typical applications require matrices of 
dimension 10^ x 10^ or larger) and hence there also will be no memory to store 
the full matrix e®^. So let us concentrate on the other extreme, the calculation 
of an arbitrary matrix element ('0|e*^|'0'). Although from a mathematical point 
of view, formal expansion (4) is all that is really needed, when it comes to 
computation (4) is quite useless. The reason is not so much that it is a Taylor 
series but rather that it contains powers of the matrix, indicating that simply 
summing the terms in (4) may be very inefficient (and indeed it is). 

There is one particular case in which it is easy to compute the matrix element 
namely if all the eigenvalues and eigenvectors are known. Indeed, from 

(4) it follows that 

oo ^ oo 

n=0 ' n=0 

where (here and in the following) En denotes the n-th eigenvalue of the matrix 
H and \^n) is the corresponding eigenvector. We will label the eigenvalues such 
that Eo < El < . . . < Em -1 where M is the dimension of the matrix H. From 

(5) is follows that 



M-l 



3=0 



( 6 ) 
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Of course, (6) is almost trivial but it is important to keep in mind that, except 
for some pathological cases, there seems to be no other practical way to compute 
the matrix element (t/?|e^^|^') without making approximations (assuming H is 
a large matrix) . In general we don’t know the solution of the eigenvalue problem 
of the matrix iif , otherwise we would already have solved the most difficult part 
of the whole problem. Therefore (6) is not of practical use. 

Solving the time-dependent Schrodinger equation for even a single particle 
moving in a non-trivial (electromagnetic) potential is not a simple matter. The 
main reason is that for most problems of interest, the dimension of the matrix 
representing H is quite large and although the dimension of the matrices involved 
is certainly not as large as in the case of typical many-body quantum systems, 
exact diagonalization techniques are quite useless. Indeed, a calculation of the 
time-development of the wave function by exact diagonalization techniques re- 
quires the knowledge of all eigenvectors and all eigenvalues (i.e. for a matrix of 
dimension 10^ x 10^ one needs ^ 10^^ Mb or more RAM to store these data). 
Thus, we need algorithms that do not use more than 0(M + 1) storage elements. 
Diagonalization methods that only require 0{M + 1) memory locations are of 
no use either because they can only compute a (small) part of the spectrum. 
Methods based on importance sampling concepts cannot be employed at all be- 
cause there is no criterion to decide which state is important or which is not: 
The “weight” of a state is a complex number of “size” one. 

Although from a numerical point of view the TDSE looks like any other 
differential equation which one should be able to solve by standard methods 
(e.g. Runge-Kutta) this similarity is misleading. Standard methods are based on 
(clever) truncations of the Taylor series expansion. It is easy to convince oneself 
that for the TDSE this implies that these numerical algorithms do not conserve 
the norm of the wave function [6]. This, from a physical point of view, is unac- 
ceptable because it means that during the numerical solution of the TDSE, the 
number of particles will change. Moreover, it can be shown [6] that this implies 
that these methods are not always stable with respect to rounding and other 
numerical errors. For completeness it should be mentioned that the Cranck- 
Nicholson algorithm does conserve the norm of the wave function and is un- 
conditionally stable. However, except for one-dimensional problems, in terms of 
accuracy and efficiency it cannot compete with the algorithms to be discussed 
below [6]. 

A key concept in the construction of an algorithm for solving the TDSE 
is the so-called unconditional stability. An algorithm for solving the TDSE is 
unconditionally stable if the norm of the wavefunction is conserved exactly, at 
all times [6]. Prom a physical point of view, unconditional stability is obviously 
an essential requirement. If an algorithm is unconditionally stable the errors due 
to rounding, discretization etc. never run out of hand, irrespective of the choice 
of the grid, the time step, or the number of propagation steps. Recall that the 
formal solution of the TDSE is given by 



|<?(mr)) = = 0)) 



( 7 ) 
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where m = 0, 1, . . . counts the number of time-steps r. Here and in the following 
we absorb h in r, 

A simple, general recipe for constructing an unconditionally stable algorithm 
is to use unitary approximations to the (unitary) time-step operator U{r) = 
q-itH |-0j Trotter-Suzuki product formula approach, to be discussed in the 
next section, provides the necessary mathematical framework for constructing 
unconditionally stable, accurate, and efficient algorithms to solve the TDSE [6]. 

2 Theory 

In all cases that we know of, the Hamiltonian is a sum of several contributions 
and each contribution itself is usually simple enough so that we can diagonalize 
it ourselves by some (simple) transformation. The Hamiltonian for a particle in 
a potential provides the most obvious example: we can write the Hamiltonian 
as a sum of the free-particle Hamiltonian and a potential energy. It is trivial to 
diagonalize both parts independently, but it is usually impossible to diagonalize 
the sum. 

The question we can now put to ourselves is the following. Suppose we can 
diagonalize each of the terms in H by hand. Then, it is very reasonable to assume 
that we can also compute the exponential of each of the contributions separately 
(see the discussion in the previous section). Is there then a relation between the 
exponentials of each of the contributions to H and the exponential of H and if 
so, can we use it to compute the latter? 

The answer to this question is affirmative and can be found in the mathe- 
matical literature of the previous century. The following fundamental result due 
to Lie [7] is the basis for the Trotter-Suzuki method for solving quantum prob- 
lems. It expresses the exponential of a sum of two matrices as an infinite ordered 
product of the exponentials of the two individual matrices: 

qX{A+B) ^ 

m-^oo \ / 

where, for our purposes, A and B are M x M matrices. The result (8) is called 
the Trotter formula. A first hint for understanding why (8) holds comes from 
comparing the two Taylor series 

gx(A+B)/m ^ j + B) + + Bf + 0{x^/m^) 

m 2 

= /+-(A + B) 

m 

+ AB + BA + B^)-\-0{x^/m^) , (9) 

2 

and 

1 2 

^xAlm^xBIm ^ j + B) + ^^{A^ + 2AB + B^) + 0{x^ Im^) . (10) 

m 2 
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It is clear that for sufBciently large m, both expansions will agree up to terms of 
0{x^\\[A^B]\\lm?)} Thus, for sufficiently large m (how large depends on x and 

^x{A+B)/Tn ^ ^xA/m^xB/m _ 

A mathematically rigorous treatment shows that [9] 



||gX(A+B)/m _ ^xA/m^xB/m^^ ^ ^ ^ ||[A, (12) 



demonstrating that for finite m, the difference between the exponential of a sum 
of two matrices and the ordered product of the individual exponentials vanishes 
as Im. As expected, (12) also reveals that this difference is zero if A and B 
commute:’ if [A, B] = 0 then = qxA^xB case at hand x = —imr 

the upper bound in (12) can be improved considerably to read [6] 



||g-ir(A+B) _g-irAg-irB|| < :L||[A,B]| 



(13) 



Except for the fact that we assumed that H = A B, the above discussion 
has been extremely general. This suggests that one can apply the Trotter-Suzuki 
approach to a wide variety of problems and indeed one can. We have only dis- 
cussed the most simple form of the Trotter formula. There now exist a vast 
number of extensions and generalizations of which we will consider only three. 

The Trotter formula is readily generalized to the case of more than two 
contributions to H, Writing H = Ai it can be shown that [6, 9] 

2 

||g-ir(Ai + ...+Ap) _g-ir>li _ ^ ^ ||[Ai,Aj]||, ( 14 ) 



showing that any decomposition of the Hamiltonian qualifies as a candidate for 
applying the Trotter-Suzuki approach. This is an important conclusion because 
the flexibility of choosing the decomposition of H can be exploited to construct 
efficient algorithms. Prom the above discussion it is also clear that at no point 
an assumption was made about the “importance” of a particular contribution 
to H. This is the reason why the Trotter-Suzuki approach can be used where 
perturbation methods break down. 

The product formula (11) is the simplest one can think of. We use it to define 
an approximate time-step operator 

J7i(r) . (15) 

The hermitian conjugate of this operator is given by 

(16) 

from which it follows that 

U^{T)Ulir)=I . (17) 



1 The norm of a matrix X is defined by ||X|| = M-^/2(TrX+X)^/^ 
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For simplicity we have assumed that H has been written as a sum of her- 
mitian contributions, i.e., Ai = AJ for i Result (17) implies that 

{Ui{r))~^ = (r); hence Ui{r) is a unitary approximation to the time-step op- 
erator Thus, if we succeed in implementing C/i(r), the resulting algorithm 

will be unconditionally stable by construction. The upperbound in (14) shows 
that the error made by replacing by Ui (r) will, in the worst case, never 

exceed a constant multiplied by Therefore Ui{r) is said to be a first-order 
approximant to the time-step operator. 

For many applications it is necessary to employ an algorithm that is correct 
up to fourth order in the time step. Approximations correct up to second order 
are obtained by symmetrization [6, 8, 9] 

UAt) = UI{t/2)U,{t/2) (18) 

where the Ui is the transpose of Ui . Trotter-Suzuki formula-based procedures to 
construct algorithms that are correct up to fourth order in the time step are given 
in [6]. From a practical point of view, a disadvantage of the fourth-order methods 
introduced in [6] is that they involve commutators of various contributions to the 
Hamiltonian. Recently Suzuki proposed a symmetrized fractal decomposition of 
the time-evolution operator [9]. Using this formula, a fourth-order algorithm is 
easily built from a second-order algorithm by applying [9] 

U 4 {t) = U 2 {pT)U 2 {pr)U 2 {{l - 4p)T)U2{pr)U2{pr) , (19) 

where p = 1/(4 — 4^/^) and Un(r) is the nth order approximation to U(r), 
i.e., U{t) = Unir) -h It is trivial to show that all of the above ap- 

proximations are unitary operators, hence the corresponding algorithms will be 
unconditionally stable. Note that once we have programmed a first-order algo- 
rithm, writing the code to implement the second- and fourth-order algorithms 
will normally only take a few seconds. 



3 Data Analysis 

The amount of data generated by a TDSE solver can be tremendous: the wave 
function is known at each time step so that in principle the TDSE solver can 
generate 0{16mM) bytes of data in a single run. In typical applications, M ^ 
10® and m > 1000. Clearly it may be difficult to store all these data. Therefore 
it is more appropriate to process the data as it is generated and compress it as 
much as possible. 

A very appealing method for looking at the data is to make say 100 snap- 
shots of the (coarse-grained) probability distribution and to use visualization 
techniques to produce digital videos [10, 11, 12]. Simply looking at these videos 
can bring a lot of insight; however, to be on the save side this insight should be 
confronted with the results of more advanced, numerical processing of the data. 

The numerical processing of the raw data generated by the TDSE solver de- 
pends to a considerable extent on the details of the actual application. Therefore 
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I will not dwell on this subject in Ml generality but confine myself to a discus- 
sion of a simple, widely applicable method for extracting information about the 
spectrum of the model Hamiltonian from the raw data. 

The idea is straightforward. Consider the matrix element (#(t = 0)|#(t)) and 
write |^(^)) in terms of the (unknown) eigenvalues and eigenvectors of H to 
obtain 

M-l 

fit) = m = 0)\m) = E (20) 

j=o 

Prom (20) it is clear that the Fourier transform of f{t) with respect to t will give 
direct information on all the Ejs for which the overlap \{^{t = 0)\^j)\^ is not 
negligible. In other words, if we keep all the values of f{t = mr) and compute its 
Fourier transform, we obtain the local (with respect to the initial state ^{t = 0)) 
density of states. 



4 Implementation 



In general there will be many possibilities for writing down different decomposi- 
tions of a given Hamiltonian. Prom a theoretical point of view, the choice of the 
decomposition is arbitrary. In practice, however, this flexibility can be exploited 
to a considerable extent to tailor the algorithm to the computer architecture on 
which the algorithm will execute. Of particular interest are decompositions that 
vectorize well and have a large intrinsic degree of parallelism. 

We now illustrate the application of the theory presented above to the case of 
a charged (spinless) nonrelativistic particle in an external, static magnetic field 
B. The Hamiltonian reads 

where m* is the effective mass of the particle with charge e, p = — i/iV is the mo- 
mentum operator, A represents the vector potential and V denotes the potential. 
For many applications it is sufficient to consider the choice B = (0, 0, B(x, 2/)) 
and V = V{x,y). Then the problem is essentially two-dimensional and the mo- 
tion of the particle may be confined to the x-y plane. For numerical work, there 
is no compelling reason to adopt the Coulomb gauge (divA = 0). A convenient 
choice for the vector potential is A = (Ax{x,y),0^0) where 



.{x,y) = -f 

Jo 



B{x,y) dy 



( 22 ) 



We will solve the TDSE for the Hamiltonian (21) with the boundary condi- 
tion that the wave function is zero outside the simulation box, i.e., we assume 
perfectly reflecting boundaries. 

For computational purposes it is expedient to express all quantities in dimen- 
sionless units. Fixing the unit of length by A, wavevectors are measured in units 
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of k = 27t/A, energies in E = h^k^/2m*, time in h/E and the vector potential 
in eX/h. Expressed in these dimensionless variables Hamiltonian (21) reads 



n = 



1 

47T^ 



dx 



iAx{x,y)] 




+ V{x,y) 



(23) 



An essential step in the construction of a numerical algorithm is to discretize the 
derivatives with respect to the x and y coordinates (of course, if the problem is 
defined on a lattice instead of in continuum space this step can be omitted). For 
many purposes, it is necessary to use a difference formula for the first and second 
derivatives in (23) that is accurate up to fourth order in the spatial mesh size 
5. Using the standard four and five point difference formula [13] the discretized 
r.h.s. of (23) reads 



= 48^ { [l - + Ai+2,k)]^i+2At) 

4- +i6{Ai-2,k 

- 16 1^1- '—{Ai^k + ^/+l,A:)j^/+l,A:(^) 

— 16 1^1 + — [Ai-i^k + j ^l-l,k{i) 

+ ^l,k+2 + ^l,k-2 — 16^i,A:+l — 16^;,A:-l(^) 

+ [60 + <5M,%+487r"<52V),,]#,,jt(t)|+O(<5") , (24) 

where ^i,k{t) = ^{lS,kd,t) and Ai^k = Ax{lS,kS). The discretized form (24) will 
provide a good approximation to the continuum problem if S is substantially 
smaller than the smallest physical length scale. For the case at hand there are 
two such scales. One is the de Broglie wavelength of the particle (which by 
definition is equal to A) and the other is the (smallest) magnetic length defined 
by == min(a.^y) \h/eB{x, y)\. From numerical calculations (not shown) it follows 
that S = 0.1 min(l, /s) yields a good compromise between accuracy and the CPU 
time required to solve the TDSE. 

Straightforward application of the product-formula recipe to (24) requires 
a cumbersome matrix notation. This can be avoided in the following way [6]. 
Defining 

Lx Ly 

= ’ ( 25 ) 

i=i k=i 

where Lx and Ly are the number of grid points in the x and y direction re- 
spectively and creates a particle at lattice site (/,A;), (25) can be written 
as 

\${rriT)) = = 0)) (26) 
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where 



H = 



1 



Lx— 2 Ly 



487t2J2 






1=1 k=l 



37r2^ ^ 2 



7+1, A; 



1=1 k=l 






^ Lx ^ 

+i^EE 

;=1 A:=l 
^ La: Ly— 1 

‘ EE i^t,k^l,k+l + ^Lit+l^LA;) 

i=l A:=l 
Lx Ly 



37r2J2 



+ 4^ E E (60 + <^'4* + 48+<5+,,) + 0(5«) , (27) 



1=1 k=l 



and where ^ annihilates a particle at lattice site (l,k). 

Hamiltonian (27) describes a particle that moves on a two-dimensional lat- 
tice by making nearest- and next-nearest-neighbor jumps. This interpretation 
suggests that H should be written as a sum of terms that represent groups of 
independent jumps [6]. A convenient choice is 



Ai = 






4+2, 



48^ + Ai+2,k)]ct^k^i. 

l^Xi k=l 

+ j^l + W (+,fc + Aj+2,fc) j (4+2 ).Ci k ( ) 

Xi = {1,2,5,6,9,10,...} , 



,k*^l+2,k 



'4" = 4+5EE{ [i-W(^U+^.+«)]< 

k=i ieX2 ^ 

+ [l + + ++2,ft)jcj!^2,fc^i,fc| ! 

X2 = {3,4,7,8,11,12,...} , 

^a—EEl [l - +++i,*)]4*^(+i.* 

fc=l/GX3 ^ 

i(5 ^ 

[l + + ++i,*)]cj*;i,fcC, j,| ; 
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and 






Xs = {1,3,5,7,9,11,...} , 



oi £ 1 ] I + ^;+ i , fc )] 

fe=lZ6.X4 ^ 



+ 



W, 



X4 = {2,4,6,8,10,12,...} , 



1 + — (^/,fc + ^i+i 



487t2<52 



EE {^tk^l,k+2 + ^tk+2^l,k) 5 



keXs i=i 



X5 = {1,2,5,6,9,10,...} 



Aq = 



1 



487t^^^ 



EE i^t,k'^l,k+2 + ^tk+2^l,k) ! 



keXe 1=1 



Xe = {3,4,7,8,11,12,...} 



-1 * 

2^7 = 3^2 ^2 E E i^tk^l,k+l + ^tk+l^l,k) ; 



k&X7 1=1 



Xt = {1,2,5,6,9,10,...} 



-1 “ 

"^8 = 3^2 (J2 E E ^tk+l^l,k) ; 



fceXs j=i 



Ly L 



Xg = {3,4,7,8,11,12,...} 



^9 - 45^52 EE + 4Sn^5^Vi,k) 



k=l 1=1 



t/i(r)= Re 



— IT An 



n=l 



(28) 



(29) 



is the first-order approximant from which the algorithm, correct up to fourth- 
order in the spatial (5) and temporal (r) mesh size, can be built. 

Inspection of for n = 1, . . . , 9 shows that each of the terms commutes 
with all the other terms in the sum over k and 1. This is because each of these 
terms corresponds to a jump of the particle between a pair of two isolated sites. 
For the purpose of implementation, this feature is of extreme importance [6]. To 
illustrate this point it is sufficient to consider the first of the exponents in (29) 
and use the fact that all terms commute to rewrite it as 



-irAi 



,k^l+2,k 



' = fi n + ^i+2,k)]ctk 

k — 1 l^X\ 

+ 1^1 + i6{Ai^k + ^i+2,k)jcf_^_2,k^i,k I) .(30) 
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Furthermore, each of the exponents in the product (30) describes a two-site 
system, and the exponent of the corresponding 2x2 matrix can be worked out 
analytically [6]. In general 



exp{Tac+^,^, + ra^ 






'Lk 



) — + ^r,k 

— i(a; C/^fcCj, yj., 



%.fc')cosr|a| 



■ (31) 



The rather formal language used above easily translates into a computer pro- 
gram. All that (28)-(31) imply is that for each factor in product formula (29), 
one has to pick successive pairs of lattice points, get the values of the wave func- 
tion for each pair of points and perform a plane rotation using matrices of the 
form 



M = 



COST a 



-ia* ^sinrlal cosrlal 



(32) 



For each of the nine exponentials^, the order in which the pairs of points are 
processed is irrelevant. Therefore, the computation of each of the nine factors 
can be done entirely parallel, fully vectorized, or mixed parallel and vectorized 
depending on the computer architecture on which the code will execute. Further 
technical details on the implementation of this algorithm can be found elsewhere 
[14]. 



5 Application: Quantum Interference 
of Two Identical Particles 

Trotter-Suzuki-based TDSE solvers have been employed in the study a va- 
riety of problems including wave localization in disordered and fractals [6, 15], 
electron emission from nanotips [16, 17, 10], Andreev reflection in mesoscopic 
systems [18, 11], the Aharonov-Bohm effect [14, 12], quantum interference of 
charged identical particles [19, 12], etc. Appealing features of the TDSE ap- 
proach are that is extremely flexible in the sense that it can handle arbitrary 
geometries and (vector) potentials and that its numerical stability and accuracy 
are such that for all practical purposes the solution is exact. 

Trotter-Suzuki-formula-based algorithms can and also have been used to 
solve the TDSE for few-body quantum systems, including a 26-site S=l/2 
Heisenberg model [20]. The application of the TDSE approach is mainly lim- 
ited by the storage needed for the (complex- valued) wave function. 

In this section we will use the TDSE approach to study some aspects of 
quantum interference of charged identical particles. Recently Silverman [21, 22] 
proposed and analyzed a thought experiment that combines both the features of 
the Aharonov-Bohm (AB) and Hanbury-Brown and Twiss (HBT) experiments. 

^ The case n = 9 is even simpler than the other eight cases but for the sake of brevity, 
a discussion of this detail is omitted. 
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Fig. 1. Schematic view of the combined Aharonov-Bohm and Hanbury-Brown-Twiss 
apparatus. Charged fermions or bosons leave the source 5, pass through the double 
slit and arrive at detectors Di and D 2 . The signals of these detectors are multiplied 
in correlator C. The particles do not experience the magnetic field B enclosed in the 
double-slit apparatus 



The former provides information on the effect of the magnetic field on correla- 
tions of two amplitudes. The latter on the other hand yields direct information 
on the correlations of two intensities, i.e., of correlations of four amplitudes. 

A schematic view of the AB-HBT apparatus is shown in Fig. 1. Charged 
fermions or bosons leave the source S, pass through the double slit and arrive 
at detectors Di and D 2 . In order for the particle statistics to be relevant at all, 
it is necessary that in the detection area the wave functions of two individual 
particles overlap. For simplicity, it is assumed that the particles do not interact. 
The particle statistics may affect the single-particle as well as two-particle in- 
terference. The former can be studied by considering the signal of only one of 
the two detectors. Information on the latter is contained in the cross-correlation 
of the signals of both detectors. Below we report some of our results [19] for 
the AB-HBT thought experiment, as obtained from the numerically exact so- 
lution of the time-dependent Schrbdinger equation (TDSE) using the algorithm 
described above. 

In practice we solve the two-particle TDSE subject to the boundary con- 
dition that the wave function is zero outside the simulation box (a grid of 
1024 X 513 points), i.e., we assume perfectly reflecting boundaries. The algo- 
rithm that we use is accurate to fourth order in both the spatial and tempo- 
ral mesh size [14]. Additional technical details can be found elsewehere [14]. 
Physical properties are calculated from the two-particle amplitude = 
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(j)i (r, t)(j) 2 {r' ,t)±(j) 2 {T^ t)(j)i (r', t) where <pi (r, t) and (j) 2 {Tj t) are the single-particle 
amplitudes and the plus and minus signs correspond to the case of bosons and 
fermions respectively. 

Let us first reproduce Silverman’s analysis [21, 22]. Assume that the double- 
slit apparatus can be designed such that the probability for two identical particles 
(fermions or bosons) passing through the same slit can be made negligibly small. 
The two slits then act as the two sources in the HBT experiment with one 
modification: due to the presence of the vector potential the waves can pick up 
an extra phase shift. According to Silverman [21, 22], it immediately follows 
that the signal generated by the cross-correlator will not show any dependence 
on the confined magnetic field. The AB shifts for the direct process and the one 
in which the identical particles have been interchanged mutually cancel. This 
cancelation is independent of the fact that the particles are fermions or bosons 
[23]. 
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Fig. 2. Simulation results for single {top) and correlated {bottom) detector signal for 
B = 0, obtained from the solution of the TDSE for the initial state as described in 
the text. Left Signals generated by fermions. Right Signals generated by bosons. The 
corresponding pictures for B = Bq are identical and not shown ^ 



The basic assumption of Silverman’s analysis is easily incorporated into a 
computer experiment. The initial two-particle wave function is a properly sym- 

^ Bo is the magnetic field for which the Aharonov-Bohm shift of the interference 
pattern is equal to tt. 
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metrized product of single-particle wave functions which, for simplicity, are taken 
to be Gaussians. Each Gaussian is positioned such that during propagation it 
effectively “hits” only one slit. The single (top) and correlated (bottom) signals, 
received by detectors placed far to the right of the slits for B = 0 for fermions 
(Lh.s) as well as for bosons (r.h.s.) are shown in Fig. 2. 

For fermions the correlated signal for 6 i = 62 vanishes, as required by the 
Pauli principle. This feature is hardly visible, due to the resolution we used 
to generate the pictures but it is present in the raw data. Within four digit 
accuracy, the corresponding data for B = Bq (or, as a matter of fact, for any 
B) are identical to those for B = 0 [19]. Comparison of the cross-correlated 
intensities (bottom part) clearly lends support to Silverman’s conclusion [21], 
[22]. However, it is also clear that the single-detector signals (upper part) do 
not exhibit the features characteristic of the AB effect. Under the conditions 
envisaged by Silverman, not only is there no AB effect in the cross-correlated 
signal, there is no AB effect at all. 

The absence of the AB effect can be traced back to Silvermans’s assump- 
tion that the slits can be regarded as sources, thereby eliminating the second, 
topologically different, alternative for a particle to reach the detector. A diflPerent 
route to arrive at the same conclusion is to invoke gauge invariance to choose the 
vector potential such that the two particles would never experience a nonzero 
vector potential. 

A full treatment of the thought experiment depicted in Fig. 1 requires that 
all possibilities for both identical particles are included in the analysis. This is 
easily done in the computer experiment by changing the position and width of 
the Gaussians used to build the initial wave function of the fermions or bosons 
such that they both hit the two slits. Some of our results for the case of two 
bosons are shown in Fig. 3. Comparison of the upper parts of Fig. 3 provides 
direct evidence of the presence of the AB effect. 

The cross-correlated boson intensities (r.h.s. of the bottom part of Fig. 3) 
clearly exhibit an AB-like effect. The positions of the maxima and minima are 
interchanged if the magnetic field changes from B = 0 to B = Bq. We have 
verified that the shift of these positions is a periodic function of the field B. 
These results for the case of boson statistics cannot be explained on the basis of 
Silverman’s theory [21, 22]. 

In general we find that there is only a small quantitative difference between 
the fermion and boson single-detector signals: the interference fringes of the 
fermions are less pronounced than in the case of bosons, another manifestation 
of the Pauli principle. The differences in the cross-correlated fermion intensities, 
due to B, are not as clear as in the boson case. Substr acting the ^ = 0 from 
the B = Bo signal and plotting the absolute value of this difference (not shown) 
clearly shows that also the cross-correlated fermion intensity exhibits features 
that are characteristic of the AB effect [19]. The high symmetry in all the corre- 
lated signals shown is due to our choice B = 0 01 Bq. The fact that we recover 
this symmetry in our simulation data provides an extra check on our method. 
If B is not a multiple of Bq, this high symmetry is lost but the salient features 




Quantum Dynamics in Nanoscale Devices 223 







Fig. 3. Simulation results for single- (top) and correlated (bottom) detector signal gen- 
erated by two bosons, as obtained from the solution of the TDSE for the initial state 
described in the text. Left B = 0. Right B = Bq 

of the signals remain the same. From our numerical experiments, we conclude 
that in an AB-HBT experiment, an AB shift of the interference pattern will be 
observed in both the single- and two-detector experiments. The AB effect (in 
both experiments) is more pronounced for bosons than for fermions. 
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Abstract. The study of dynamical quantum systems, which are classically chaotic, 
and the search for quantum manifestations of classical chaos, require large-scale nu- 
merical computations. Special numerical techniques developed and applied in such 
studies are discussed; the numerical solution of the time-dependent Schrodinger equa- 
tion, the construction of quantum phase-space densities, quantum dynamics in phase 
space, the use of phase-space entropies for characterizing localization phenomena, etc. 
As an illustration, the dynamics of a driven one-dimensional anharmonic oscillator is 
studied, both classically and quantum mechanically. In addition, spectral properties 
and chaotic tunneling are addressed. 



1 Classical and Quantum Chaos 

During the last three decades it has become evident that the dynamics of sim- 
ple Hamiltonian systems can be remarkably complex. Typical examples of such 
‘simple’ systems are a point mass in a two-dimensional time-independent po- 
tential, or - even simpler - an explicitly time-dependent system with a single 
degree of freedom. In many important cases, such a system can be considered as 
time-periodic. The best-studied case is certainly the celebrated forced or para- 
metrically excited harmonic oscillator. Numerous papers have been published 
that analyze the classical or quantum dynamics of such a harmonic oscillator 
in much detail. One should be aware of the fact, however, that the harmonic 
oscillator is a very special case: the classical equations of motion are linear. For 
all other systems this is not the case. Their behavior is studied in ‘nonlinear 
dynamics’. 

It is well-known that deterministic classical systems show erratic, irregular 
behavior. Moreover, this chaotic dynamics is a generic property. Typical systems 
show an intricate mixture of regular and irregular motion, whose structural 
organization can be most conveniently displayed by means of Poincare sections 
of phase space. Rather than analyzing a full trajectory in the higher-dimensional 
phase space, one considers only its intersections with a reasonably chosen surface. 
In this way, the dynamics can be treated as a discrete mapping. 

Such a discretization is of particular simplicity for time-periodic systems with 
one degree of freedom. Here one can look at the system stroboscopically, i.e., at 
times tn = nT, n = 0, 1, . . . , where T is the period. Figure 1 shows such a phase- 
space plot for a forced nonharmonic oscillator (see Sect. 4 for details). A synoptic 
plot of several trajectories with different initial conditions is shown. One observes 
regular regions in which the phase-space points generated by a trajectory trace 
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out lines, the so-called invariant curves. Here the motion is regular. In addition, 
we find points that cover an area in phase space. In fact, all these points in 
Fig. 1 are computed from a single trajectory, a chaotic one. Classically, a chaotic 
trajectory - or more generally chaotic dynamics - is defined by an exponential 
separation of initially nearby trajectories in the long time limit (more precisely 
a positive Lyapunov exponent). Numerical studies (computer ‘experiments’) of 
this type are very helpful for studying chaotic dynamics and PC programs are 
available for many systems of interest in physics [28]. More details on the theory 
of the dynamics of Hamiltonian systems in the context of quantum dynamics 
can be found in textbooks (e.g., [44, 35, 21]). 




Position q 

Fig. 1. Stroboscopic Poincare section for a classically driven anharmonic oscillator 



Quantum dynamics, however, is governed by the Schrddinger equation 



ih'tp = Hip , (1) 

which is a linear equation, and it is therefore questionable if such a time evolution 
can be chaotic. For example, it is straightforward to show that - for a finite 
(iV) dimensional Hilbert space - the time-dependent coefficients in a basis set 
expansion 'ip{t) =■ Cn{t)4>n satisfy a finite system of coupled linear equations. 
Moreover, by separating real and imaginary parts, e.g., Cn =■ qn ^ ipn ? these 
differential equations can be written as the canonical equations of motion of a 
classical iV-dimensional harmonic oscillator, which is certainly not chaotic. 




Quantum Chaos 227 



Nevertheless, classical mechanics is the limit of quantum mechanics for h 0, 
and therefore it is of fundamental importance to understand this highly nontriv- 
ial limit, and considerable work has been done. This fascinating field of con- 
temporary research is denoted as quantum chaos [22] or postmodern quantum 
mechanics [24] and various excellent books [21, 40, 23, 14, 34] as weU as recent 
conference proceedings [11, 15, 8, 26] summarize the results. 

In order to find the quantum manifestations (if any) of classical chaos, much 
of the recent research is supported by large-scale computations (for small h the 
dimension of the Hilbert space is large, the wave functions are highly oscillatory, 
a long time propagation is of interest, etc) and special techniques for analyzing 
the system’s behavior have been developed. Here, we will discuss some of these 
methods and illustrate their application to the seemingly simple case of time- 
periodic systems with one degree of freedom (see [9] for an overview of the 
properties of such systems). 

2 Quantum Time Evolution 

The time evolution of a quantum state ^ is determined by the Schrodinger equa- 
tion (1), which can be solved numerically by numerous methods. Among the most 
popular and efficient ones is an expansion in a discrete basis set, which converts 
the Schrodinger equation into a set of coupled linear differential equations, and 

- with an increasing number of applications - the direct solution as a partial 
differential equation in the coordinate representation, e.g., 

ih = HrPix, 0 = + V{x, i)) V’Cx, t) . (2) 

In addition, a mixed treatment is also possible (and sometimes also the most 
efficient strategy) , whereby some of the degrees of freedom are treated by a basis 
set expansion and the remaining ones are dealt with by solving a set of coupled 
differential equations. 

One of the most powerful techniques for solving equation (2) is the so-called 
split-operator method [13], whereby the time propagator U for the Hamiltonian 

- split into a kinetic and potential energy part - 

Hij= (^^ + V{x)^^ = {K + V)^p (3) 

can be approximated for a (small) time step 6 by 

U{S) = e-^^^ « (4) 

Observing that the operator V is diagonal in the coordinate representation (i.e., 
a simple multiplication by the number F(x)), whereas the operator K is diagonal 
in the momentum representation (i.e., a multiplication by the number p^/2m). 
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the time propagation can be easily carried out by switching between the two 
representations by means of a fast Fourier transformation [39]. 

Different propagation schemes have been developed; for a critical comparison 
see [29] . Some more recent techniques are the staggered-time algorithm [49] , the 
(unitary) fourth order method [12], a multigrid method [2], and the (t, t') method 
[37] based on an extended phase-space description for explicitly time-dependent 
systems. 

Here we will restrict ourselves to one degree of freedom, i.e., the numerical 
solution of 



d'ip{x,t) _ 
dt “ 



Ti? 

2m dx^ 



-{-V{x, t)'ip{x,t) 



( 5 ) 



with boundary conditions 'ip{xjnin,'t) = = 0- We describe in some 

detail a numerical method, the so-called Goldberg algorithm [16, 27], which 
works very well in this case. 

First we discretize the coordinate x and construct the solution 'tpj only at 
points Xj = ^min d" JC, for j = 1, . . . , Jmax ^-Ud Xjmax “ ^max* 

Using a discrete expression for the second derivative 






\dx^ ) ■ e' 



( 6 ) 



the action of the Hamiltonian is given by 

Discretizing the time in equidistant steps (S, i.e., tn = to + nS, the wavefunction 
il^n+i stt time tfi+i is obtained from ipn by 



ipj . ( 8 ) 

Here it is not possible to approximate the exponential operator by 1 — iH 6/ h, 
because such a nonunitary approximation leads to instabilities of the temporal 
evolution. A uniform approximation is given by the Cayley form 



1 -— 

e~^HS ^ 2 ^ ^ ^ ( 9 ) 

which is correct up to second order in S. Inserting now (9) into (8) and moving 
the denominator of (9) to the left-hand side in (8) , we obtain the iteration scheme 

+ (iA - < - = -i’j+i + (iA + (10) 

with A = Ame^/M and n" = 2me^V{xj,tn)lf^ ■ These difference equations are 
stable and unitary, however implicit. The solution of the tridiagonal matrix equa- 
tion (10) is a standard problem of numerical mathematics, which is solved by 
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recursion [16, 27] using the boundary conditions xpQ = = 0. This method 

can be easily implemented numerically and allows a fast real-time solution of 
the Schrodinger equation on a PC. For the time-independent case, PC programs 
with graphical representation are available [3] and can be used for illustrat- 
ing phenomena of elementary quantum mechanics, e.g., motion of wavepackets 
in various potentials; dynamics of coherent and squeezed harmonic oscillator 
states; tunneling through potential barriers, etc. 

A few remarks on a reasonable choice of the parameters will be helpful. First, 
the mesh width e determines the smallest wavelength that can be accurately 
described on a discrete grid. A typical choice is e = ^/5pmaxj where Pmax is the 
largest classical momentum. A reasonable choice of the time step is 5 = 
leading to a balance of the errors induced by time and space discretization (see 
[16, 27, 39] for more details). 

It should be noted that a direct extension of the Goldberg algorithm to 
systems with more degrees of freedom requires a matrix inversion at each time 
step (the solution by recursion is no longer possible) and is therefore inefficient. 
Special techniques have been developed, however, which reduce the problem to 
intertwined one-dimensional ones; for details see [42, 43]. 

3 Quantum State Tomography 

3.1 Phase-Space Distributions 

In classical dynamics, the equations of motion (or experimental measurements) 
yield the trajectory q{t) of the system for given initial conditions. This trajectory 
contains all information, but important dynamical features are only visible if 
they are carefully extracted, e.g., by plotting the trajectory in adequate variables. 
Typically, one analyzes the dynamics in phase space (p, g), where the momentum 
p{t) can be obtained from q{t) by differentiation. In addition, one can restrict 
oneself to a section of phase space, the Poincare section, as, for example, in the 
stroboscopic plot shown in Fig. 1 for a driven anharmonic oscillator. Such a plot 
reveals the dynamical properties of the system for all initial conditions and it 
provides a global description of the dynamical features of the system. 

In quantum mechanics the situation is similar. As described above, one can 
numerically generate the wavefunction in coordinate space as a function 

of time, but the essential dynamical features are still to be determined. The 
absolute square t)\^ yields the probability of finding the particle at a given 
position, and by means of a Fourier transform of '0(x, t) to momentum space one 
obtains the overall probability for the momentum. In order to obtain information 
about the momentum distribution at a given position, one can use the Gabor 
(or Fourier window) transformation 

a.. ,n, 
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This is a Fourier transform to momentum (p) space, which is weighted by a 
Gaussian window centered at position q. The so-called squeezing parameter s 
controls the width of the window. Up to a multiplicative factor, (p, q\ij)) is equal 
to the momentum distribution for s = 0, and the coordinate representation is 
reproduced for large s. Equation (11) can also be expressed as a projection of 
the wavefunction onto a so-called minimum uncertainty wavepacket (also called 
a coherent state): 



(j>p,q (ar) 



exp 



-s{x - qf 
2 % 



-‘I 




( 12 ) 



a state with mean values q and p for position and momentum, respectively, and 
the uncertainties Ap = ^/Ks]2^ Aq = y/fij2s^ ApAq = %!2. The squeezing 
parameter s = Ap/Aq determines the ratio of the uncertainties and can be 
adapted to the problem under investigation. The absolute square 



Ph(p, q) 



\(p,q\'>P)\ 






(13) 



with normalization 

J Pu{p,q)dpdq=l (14) 

is called the Husimi density [25] . It provides a quantum mechanical (quasi-) prob- 
ability distribution in phase space for a given wavefunction 'ip and is very useful 
in an analysis of the classical-quantum correspondence in dynamical systems 
[45]. 



3.2 Phase- Space Entropy 



The overall degree of localization in phase space can be obtained from the average 
information content measured by the phase-space entropy, 

s = J pn{p,q) lnpn{p,q)dpdq . (15) 

This entropy satisfies the inequality 5 > 1 [50] , which corresponds to the uncer- 
tainty relation ApAq > ^/2. The quantity e*^ measures the number of minimum 
uncertainty states populated by the wavepacket and A = 2i[he^ is the space 
area covered by the phase-space distribution. It is instructive to calculate the 
entropy of a minimum uncertainty wavepacket (12) with squeezing parameter sq 
analyzed by minimum uncertainty states with squeezing s \ . The result is simply 



S = 



1-f-ln 



^0 + ^1 



(16) 



Note that for sq = Si one obtains the smallest possible entropy S = 1, as 
expected for a minimum uncertainty state [50]. 
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As an example, Figs. 2 and 3 show the time evolution of the Husimi distri- 
bution of a minimum uncertainty wavepacket with sq = si = 1 and sq = si = 2 
moving in a harmonic potential with unit mass and frequency. In Fig. 2, the 
wavefunction is a coherent state of the harmonic oscillator and moves without 
changing its form. In Fig. 3, we have a squeezed oscillator state, which changes 
its form and uncertainty product as monitored by the entropy S. 
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Fig. 2. Contour plot of the Husimi dis- 
tribution (squeezing parameter si = 1) 
for an initial minimum uncertainty 
wavepacket (so = 1) moving in a har- 
monic oscillator with unit mass and fre- 
quency 



Fig. 3. Contour plot of the Husimi dis- 
tribution (squeezing parameter si = 2) 
for an initial minimum uncertainty 
wavepacket (so = 2) moving in a har- 
monic oscillator with unit mass and fre- 
quency 



4 Case Study: A Driven Anharmonic Quantum Oscillator 



Various paradigmatic systems are investigated repeatedly in the literature to 
explore the properties of quantum systems, which are classically chaotic. Here 
we study the time evolution of a wavepacket for a forced quartic oscillator 

H ijp, *) = ^ + ^9^ - /9 cos {bJt) , (17) 

which is time-periodic with period T = 27: lu. We choose parameter values h = 
0.25, / = 0.5 and cj = 1, a case for which the classical-quantum correspondence 
[5, 6] and the semiclassical EBK quantization of regular quasienergy states [4, 
46, 47] has been investigated recently (see also [31], where different parameters 
are chosen). 
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4.1 Classical Phase- Space Dynamics 



The classical dynamics for this system is typical and shows a mixed regular 
and chaotic behavior depending on the initial conditions. Solving the classical 
equations of motion 



dp 

dt 



= —4bq^ + / cos (ut) 
dq 



dq _ dH _ p 
dt dp m 



(18) 



for a specified initial condition (p(0),g(0)) = (poj^o) one obtains the phase- 
space trajectory (p(t), q{t))> Figure 1 shows a stroboscopic plot of the trajectory 
at times = nT, n = 0, 1, 2, . . for selected initial conditions. There is a clear 
division of phase space into three different regions. A chaotic region generated by 
a single trajectory is sharply separated from an outer regular region. A second 
regular region is centered on a T-periodic trajectory and appears as a regular 
island embedded in a chaotic sea. By closer inspection, one observes additional 
smaller chains of stability islands close to the boundary between the inner island 
and the chaotic sea. The phase-space area of the inner island is 2.25 and the 
chaotic sea covers an area of 7.85. 

For different choices of the parameters, the overall appearance (a chaotic sea 
bounded by an outer regular region) is the same. The detailed structure of the 
inner stability islands changes, however, and shows characteristic bifurcations. 
The case studied here is selected because of its structural simplicity. 



4.2 Quantum Phase-Space Dynamics 



A minimum uncertainty wavepacket ^po,go (^5 0), see (12), localized initially 
at a position (po, qo) in phase space is propagated in time using a value = 0.05 
(note that in the dimensionless units used here - by, for example, setting the 
field frequency equal to unity - the Planck constant of the system depends on 
the parameters and can therefore be adjusted). At times tn = nT, the Husimi 
density 



PH{p,q;PO,qO-,tn) = j 



(19) 



is computed on a grid of 50 x 50 points in (p, q) space. The grid covers the same 
region as the classical phase space shown in Fig. 1. 

Let us first study the quantum phase-space dynamics for a wavepacket local- 
ized at {po-,qo) = (0,0.6) initially, which is inside the classically chaotic region 
(the squeezing parameter of s = 5 is adapted to a harmonic approximation to the 
center of the inner regular island). In the computation, the Goldberg algorithm 
(see Sect. 2) is used with mesh size e = 0.005 and time step <5 = 0.001. Fig- 
ures 4 and 5 show the Husimi densities for the first 30 periods as contour plots. 
Dark regions mark large probabilities and correspond to strong localization of 
the wavefunction. 





Fig. 4. Contour plots of the first 20 Husimi distributions at times tn = nT. Shown is 
the region [—3:3] with q on the horizontal and p on the vertical axis. Picture number 
0 shows the initial minimum uncertainty state at (po^qo) = (0,0.6) with squeezing 
parameter s = 5 
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Fig. 5. Same as Fig. 4 for times <n, n = 20, . . . , 30. The last graph shows an average 
over periods 20 to 120 



The first impression from Figs. 4 and 5 is of an approximately periodic cir- 
culation of the center of the distribution with period 3T. This can be checked 
quantitatively by computing the autocorrelation 

C{t,to) = j ip*{x,to)ip{x,t)dx , ( 20 ) 

which measures the overlap of the wavefunction at time t with the initial distri- 
bution, and the recurrence probability 

PR{t,to) = \C{t,to)f . ( 21 ) 

Figure 6 shows the recurrence probability for the first 50 periods. Up to f = 20 T, 
one observes clear maxima at multiples of three. This periodicity is reflected in 
the corresponding Fourier spectrum shown in Fig. 7. The strongest peak of the 
frequency spectrum appears at = 0.33/T, which corresponds to period three. 
The same period-three circulation can be found if an initially Gaussian ensemble 
is propagated classically. 
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Time (T=2ii) 

Fig. 6. Probability C{t,to) for recur- 
rence of the wavepacket at its initial 
position 




Frequency V [1/T] 

Fig. 7. Fourier spectrum of the recur- 
rence probability shown in Fig. 6 



The increasing delocalization of the wavepacket can be measured quantita- 
tively by means of the phase-space entropy (15). Figure 8 shows the entropy 



-'Po,qo 



i^n) 



^ y'pH(p,g’;Po,9o;in) lnpnip,q;Po,qo]tn)dpdq (22) 



for the first 120 periods. Starting from the value of 1 for the initial minimum 
uncertainty state at time zero, the entropy increases within the first 20 periods, 
then it flattens into a plateau. For long times, the entropy fluctuates almost 




OT 20 T 40 T 60 T SOT 100 T 120 T 
Time t {T=2n) 



Fig. 8. Time dependence of the phase-space entropy for the distributions shown in 
Figs. 6 and 7 
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Fig. 9. Contour plot of a time-averaged quantum Husimi distribution for wavepacket 
(c) and classical Poincare section for a chaotic trajectory 



erratically with an average value of about S = 3.1, which is somewhat below 
the value of Sc\ = \n{A/27th) 3.2 obtained from the classical chaotic phase- 

space area A = 7.85 . The entropy difference is due to the fluctuation of the 
quantum dynamics in contrast to the classical dynamics, which approaches a 
uniform limiting distribution (see also the more detailed analysis for a driven- 
rotor system [33] based on the random vector model). 

One can also study the long time average of the Husimi distributions 

1 ^ 

Ph(p,«;Po,9o) = lim -Tj ^ r -r Y' PH(p,g;Po,9o;*n) ; (23) 

N-^oo iV — No + 1 

n=No 

in numerical computations using a flnite value of iV, the first Nq distributions 
during the initial delocalization should be neglected to improve the convergence. 

Figure 10 shows three time-averaged Husimi distributions for different initial 
distributions centered at i^po^qo with po = 0 and qo = 1.4 (at the center of the 
stability island), qo = 0.6 (in the chaotic region), and qo = —1.0 (in the outer 
regular region). The wavefunction has been propagated over N = 101 periods; 
the first Nq = 20 periods are not included in the average. 

The time-averaged distributions can be compared with the classical Poincare 
section in Fig. 1. We observe a clear correspondence with classical phase-space 
dynamics in the three regions. In case (a) the distribution remains localized on 
the stability island. In case (b) the distribution spreads out over the classically 
chaotic region, showing, however, an additional quantum localization in three 
regions of phase space (see Fig. 9), which is a quantum interference phenomenon. 
On the other hand, the strong concentration on the region close to (0, —1) in case 
(c) is a classical effect [4], which is also reproduced by propagation of classical 
phase-space densities. 
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Fig. 10. Time-averaged Husimi distributions of a minimum uncertainty wavepacket 
located initially at tppo,qo with po = 0 and go = 1-4 (at the center of the stability 
island), go = 0.6 (in the chaotic region), and go = —1.0 (in the outer regular region) 
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4.3 Quasienergy Spectra 

The time evolution of a wavepacket also provides information about the spectrum 
of the system, which can be computed from the autocorrelation function (20). 
In the present case of time-periodic Hamiltonians H{t -{-T) = H{t)^ this is the 
spectrum of the quasienergies Ca defined by the quasienergy states (or Floquet 
states) 

V’a(i) = , (24) 

where the Ua are T-periodic. This definition determines the quasienergies only 
up to integer multiples of hu = h/T. Often it is therefore convenient to intro- 
duce the quasiangles 6a = eaTjfi. Expanding the wavefunction in terms of the 
quasienergy states t/^(t) = Ca^a(^) with constant coefficients the Fourier 

transform of the autocorrelation 

Cn = C{tn,to) = Y, (25) 



after n periods (compare (20) ) yields 

C{9) = = 2ttY \ca?5{S - 9^) • (26) 

n a 



In practise, the computation is not extended to infinity and for a finite cutoff 
at TT-max a wiudow fuuction, e.g., Wn = (I — cos(27rn/nniax) )/2, must be intro- 
duced into the n-sum in (26), which reduces the spurious oscillations produced 
by the cutoff and smoothes the (S-functions into line shape functions C{6 — 6a), 




Quasiangle 

Fig. 11. Fourier transform of the autocorrelation function for a wavepacket placed 
initially at (1, —0.8), which is inside the chaotic sea and close to the boundary to the 
outer regular region (see Fig. 1). The peaks appear at the quasiangles 6a 
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which are determined by the window function. In addition, the quasienergy func- 
tion at time to = 0 is given by 



N 

^a(O) oc , (27) 

n 

where the proportionality constant is obtained by normalization. As an exam- 
ple, Fig. 11 shows the Fourier transform of the autocorrelation function for a 
wavepacket initially placed at (1, —0.8), i.e., inside the chaotic region of Fig. 1. 
The peaks appear at the quasiangles 0^- 

A subsequent analysis of the Husimi distributions of the corresponding quasi- 
energy states 'ipa provides information about the localization properties on dif- 
ferent regions in phase space. Furthermore, by computation of a sufficiently large 
number of quasienergies or quasiangles, the statistical properties of the quasi- 
energy spectra can be tested. The prediction is, for example, that the nearest- 
neighbor distance follows a Poisson distribution for those states localizing on 
a regular region, whereas those corresponding to a chaotic regime are Wigner 
distributed (see, for example, [23] and references therein). 



4.4 Chaotic Tunneling 

As an interesting application, one can investigate the influence of a time-periodic 
field on the dynamics of a wavepacket moving in a bistable potential, e.g. a 
double minimum potential. One system studied recently by various authors is 
the potential 

V{q,t) = bq^ — dq^ + fqcos{u;t) , (28) 

a frictionless Duffing oscillator, with values b = 0.5, d = 10, a; = 6.07, and ^ = 1 
for the parameters [30]. 

For a vanishing external driving field, we have a simple time-independent 
double-well potential and the energy E is conserved. For energies below the bar- 
rier, a quantum wavepacket localized in the left potential well tunnels through 
the barrier - the classically inaccessible region in phase space - and appears on 
the right-hand side. This process continues and we observe a tunneling oscilla- 
tion between the two minima. This phenomenon is well understood and can be 
described semiclassically (e.g., [10]) in terms of the action integral 

K = 1 j^^y/2m\E-V{q)\ dq (29) 

over the barrier, where q± are the turning points. Well below the barrier, i.e., 
for large k, the tunneling probability is e“^^ and the tunneling splitting of the 
almost degenerate energy eigenvalues is given by 
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where uJ is the classical frequency in a single well. A superposition of these states 
oscillates with period Tosc = 27^% /AE between the two wells. 

When the field is switched on, the situation is much more complicated due to 
the fact that the classical motion is chaotic and there is no conserved quantity 
in the chaotic region between the wells, and hence no equivalent of the tunneling 
integral (29). Typically, the two potential minima turn into stability islands and 
the curve separating the single- well motion from the double well-oscillation at 
higher energies (the ‘separatrix’) is destroyed by the interaction with the field. 
Instead, a chaotic separatrix layer develops, which grows with increasing field 
strength. 

Tunneling through such a chaotic layer is still far from being understood 
and the theory of such tunneling transitions is an active field of contemporary 
research (see, for example, [36, 38, 7], or [20, 18, 19, 48] for studies of a driven 
double-well oscillator and [17, 7, 1]) for related studies of a kicked or driven 
rotor). 

Figure 12 shows a stroboscopic Poincare section of the system (28) for the 
parameters given above. Results from three classical trajectories are shown: one 
trajectory generates the chaotic sea, the other two are regular and move around 
the left or right island, respectively. Transitions between the left and the right 
island are classically forbidden. Quantum mechanically, such a transition is al- 
lowed, however. 

On the right-hand side of Fig. 12, a wavepacket started on the left island 
(more precisely a minimum uncertainty wavepacket (12) centered at (poj^o) = 
(0, — 1.5) with s = 1) is shown after one period T of the driving field. It is 
obvious, that the distribution is beginning to populate the right stability island. 
After 58 periods, a considerable part of the distribution is found there, and after 





Position q 



Position q 



Fig. 12. Poincare section of classical phase-space (left) and quantum Husimi distribu- 
tion (right) of a wavepacket started on the left island after one period 
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115 periods almost the whole wavepacket has tunneled to the opposite island, as 
demonstrated in Fig. 13. It is remarkable that the distribution localizes on the 
regular island again, in spite of the fact, that it has tunneled through the region 
where the classical dynamics is chaotic. 




Position q 



Position q 



Fig. 13. Quantum Husimi distribution of a wavepacket started on the left island after 
58 (left) and 115 {right) periods 




Fig. 14. Recurrence probability as a function of time 



The recurrence probability (21) in Fig. 14 shows the continuation of the tun- 
neling process with increasing time. We observe an overall oscillation with period 
114 T of a certain fraction of the distribution between the two islands. This ^co- 
herent tunneling^ [30] can be explained in terms of quasienergy (Floquet) states 
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of different symmetry localizing on both islands [17, 36, 38, 1]. Their superpo- 
sition leads to states oscillating between the islands with a period proportional 
to the inverse of the difference of the two quasienergies. This can be checked by 
computing the quasienergies and the Husimi distribution from the autocorrela- 
tion function as discussed in Sect. 4.4. The results are shown in Figs. 15 and 16. 
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Fig. 15. Quasienergy spectrum of the autocorrelation function (see Fig. 14). The inset 
shows a magnification of the two strongest peaks 
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Fig. 16. Husimi distributions of the quasienergy doublet shown in Fig. 15 
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Recent observations from numerical computations suggest, that the quasien- 
ergy splittings for chaotic tunneling do not follow the simple semiclassical law 
(30) when % is varied [41]. Instead, they show a seemingly irregular behavior. 

5 Concluding Remarks 

In this article we have tried to demonstrate some of the numerical techniques 
used to explore the manifestation of classical chaos in the corresponding quantum 
system. We have confined ourselves to the case of a driven anharmonic oscillator 
and presented some of the techniques using quantum dynamics in phase space. 
Let us finally give a brief description of a method suggested recently for develop- 
ing a global picture of the phase-space structure of a quantum system [32] . The 
basic idea is to compute the long-time average S(po,qo) of quantum phase-space 
entropy (22) for all initial positions (po? ^o) of the initial wavepacket. This func- 
tion provides a quantitative measure of the phase-space localization properties 
of the quantum system in analogy to the classical Poincare section. Numerical 
studies for a driven-rotor system [33] have been reported recently [32] and the 
(semiclassical?) explanation of this phenomenon is not yet clear. 
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Abstract. An introductory discussion of quantum field theory as the theoretical 
framework for particle physics and the standard model is given. For the simplified 
scalar field sector, the lattice formulation and nonperturbative evaluation by Monte 
Carlo simulation are described. A FORTRAN package is included that allows the sim- 
ulation of this model for arbitrary dimension. 



1 Quantum Field Theory and Particle Physics 

In this introductory section we briefly outline the role of quantum field theory 
as the generally accepted framework for a theoretical description of elementary 
particle physics. 



1.1 Particles, Fields, Standard Model 

A simple example of particle or quanta aspects of fields is given by photons and 
the Maxwell theory. Let us imagine classical electric and magnetic fields in an 
(ideal) metallic cavity. The walls represent certain boundary conditions which 
lead to a denumerably infinite sequence of possible standing waves. Schemat- 
ically, omitting polarization, we simply label these modes by a discrete set of 
wavenumber vectors {k} and associated frequencies cj(k) = c|k|. The Hamil- 
tonian, that leads to the linear Maxwell equations, is bilinear in the fields. It 
corresponds to an infinite number of harmonic oscillators. They are decoupled 
by Fourier normal mode decomposition into independent oscillators with the 
above mentioned frequencies a;(k). Now the most naive generalization of canon- 
ical quantization of a finite number of harmonic oscillators leads to the quantum 
mechanical energies E of the Maxwell Hamiltonian 

E-Eo = Y^n{\n)Tiij{]s.) . ( 1 ) 

k 

To characterize such an eigenstate, occupation numbers n(k) >0 must be spec- 
ified for all modes. The linearly spaced oscillator spectrum lends itself imme- 
diately to a particle interpretation of states with these energies: each mode k 
corresponds to a possible photon moving with momentum /ik and possessing 
energy /ia;(k), and there is a number of n(k) such noninteracting particles. The 

* Software included on the accompanying diskette. 
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energy Eq is dismissed as the unobservable infinite zero-point energy of the 
empty cavity with only energy differences being physical. 

This almost trivial correspondence between oscillator levels and the addi- 
tive energies of noninteracting “objects” has been fruitful in several branches of 
physics leading to phonons, pseudo particles etc. For particle physics one usually 
removes the boundary conditions by approaching an infinite volume and thus 
deals with all of space time as the physical system under study. It is rather 
easy to set up fields together with a bilinear Hamiltonian, which describe other 
particles observed in nature with spin, Fermi statistics and certain multiplet 
structures like electrons, neutrinos, quarks etc. 

The field description acquires a much deeper role, if symmetries and inter- 
actions are incorporated. In perturbation theory, the entities described by the 
free (bilinear) field oscillator modes remain intact, and additional small cubic or 
higher monomials in the fields just add weak interactions between them. There 
is an intriguing interplay between symmetries and interactions in the sense that 
certain symmetries (gauge invariance) necessitate the presence of nonlinear terms 
of a specific structure and thus largely determine the interaction between parti- 
cles. 

In a historical process of learning and applying these principles and inject- 
ing experimental information on existing particle varieties and properties, to- 
day’s standard model of particle physics has emerged. In our present extremely 
schematic overview it corresponds to a structure of fields {(paityv)} and their 
dynamics in the form of a Hamiltonian or, more commonly, an action S[cp]. The 
“internal” index a labels a number of field components per space-time point 
encoding the observed degrees of freedom. Transformations act on this index by 
mixing field components. The action S is invariant under the experimentally de- 
termined group of such transformations, and it also contains the free parameters 
of the standard model (of order 20 numbers) that have to be fixed empirically 
by matching experiment. 



1.2 Beyond Perturbation Theory 

In the previous subsection we alluded to the perturbative view of quantum field 
theory. There one has a field of appropriate spin and charge for each species of 
particle to be described. Interactions are treated in the form of nonlinear but 
small perturbations. The perturbative terms are usually organized in correspon- 
dence with Feynman diagrams giving an intuitive feeling for their effects. The 
success story including QED, electroweak unification and perturbative quan- 
tum chromodynamics (QCD), nowadays integrated into the standard model, is 
almost entirely based on the application of this method. 

It has been known since long from simple model field theories that there 
may also or in addition be predictions that are not accessible in perturbation 
theory. Stronger nonlinearities may qualitatively rearrange the spectrum of the 
field theoretic Hamiltonian, such that a free theory is not a reasonable starting 
point to perturb around. Often physical theories are benign enough to indicate 
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this already within perturbation theory, for instance by “corrections” becom- 
ing of order 100%. A more subtle failure of perturbation theory arises, if some 
quantity vanishes to all orders which truly has nonzero contributions of the type 
exp(— 1/^^), where g is the expansion parameter. Both phenomena are expected 
to occur in the QCD sector of the standard model. Some of the elementary fields 
correspond to quarks which phenomenologically do not exist as free particles 
under normal conditions. This means that, unlike electrons, they never come 
out of a scattering experiment as isolated asymptotic particles. In QCD the 
term confinement has been coined for this nonperturbative rearrangement of the 
spectrum. It means that nonlinear couplings to the gluonic fields, demanded and 
determined by local SU(3) gauge invariance, produce forces which are strong and 
not decaying with separation, such that quarks cannot be isolated. As energy is 
pumped into the system, new particles are created with the consequence that one 
never sees anything but three quark states (baryons) or quark antiquark states 
(mesons). For these compounds the strongest component of the gluon force is 
neutralized. 

There clearly arises a demand to extract also the nonperturbative predictions 
from the standard model and other theories. In this way one hopes to achieve 
a detailed description of the properties of hadrons like their mass ratios and 
scattering effects. At present, only numerical methods can offer such results in 
the sense of a first principles evaluation of the field theory. We hence introduce 
this method in some detail in the following sections. 

2 Lattice Formulation of Field Theory 

In this section we introduce the formulation of euclidean quantum field theory 
on a space time lattice. Connections with statistical mechanics and critical phe- 
nomena are pointed out. A comprehensive discussion of this subject can be found 
in recent textbooks [1, 2, 3]. 

2.1 Path Integral 

Most lattice methods in field theory start from the path integral formulation. 
We briefly derive it here for a simple quantum mechanical system. We start from 
its Hamiltonian (in suitable units) 

H + ■ (2) 

Here p and q are the usual canonical operators with commutation relation 

[q,p]=ih . (3) 



As a starting point we consider the partition function 



Z = %i 




== tr 



-eH 



e 



-eH 
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= J dqi...dqN^qi e "^|g2N^2 e qsh-kN e 
~|dgi...d9jvexp|-|^ +\qf+9qt | 

- I ■ 

In this formula we have subdivided the “imaginary time” p = Ne into small 
segments e. The matrix elements ^qi Qi+i^ are only evaluated to leading 

order in e, hence the c::; signs. The last line is a symbolic notation meaning the 
limit € -> 0, iV -> 00 of the previous one. In this limit one imagines the iV-fold 
integration over gi . . .gjv to go over into an “integration over all paths”. These 
paths q{t) run from t = 0 to t ^ and are closed, qN+i = Qij g(0) = q{P)j due 
to the trace taken. 

In the path integral formulation one extracts information on the system in 
the form of expectation values of functionals of the path g(i), 



= 



/Dge ^f[q] 
fDqe-^ 



( 5 ) 



where the action 5 is the exponent in (4). A simple example is the two point 
function that gives information on the spectrum of the original Hamiltonian H. 
It is not difficult to show that 



n 



holds, where En are energy levels, Eq refers to the groundstate. 

To generalize to field theory one would start from a Hamiltonian written 
in terms of operators (p{r) for each point in space^ instead of q together with 
appropriate momenta conjugate to (p. The quantum mechanical system discussed 
above may actually be viewed as a zero dimensional field theory. It turns out 
that for the transition to the path integral the multitude of degrees of freedom 
just means another index r to be carried along resulting in an “integral over all 
fields” (p{t,r), 

Z = . (7) 

In field theory, again all dynamical information can be obtained from observables 

= ^ . ( 8 ) 

^ Suppressing the internal index a. 
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Instead of spelling out field theoretic Hamiltonians in detail it is more con- 
venient to immediately start from the euclidean action S appearing in the path 
integral. For the prototype model of scalar Lp^ theory it is given by 

s = J I . (9) 

Here (t,r) has been combined into a four-component vector = 0,1,2, 3. 
Units with h = c = 1 are used from here on. Apart from serving as a simplest 
example, the theory also has a role within the standard model. The famous 
Higgs sector has scalar fields with this kind of selfinteraction. Although these 
fields are coupled with fermions and gauge fields, for certain properties of the 
theory it was considered reasonable to neglect these couplings and study the 
scalar sector in isolation. Ref. [4] is a well known example for such research. 



2.2 Lattice Regularization 

At a formal level we have introduced quantum field theory as an integral over all 
fields over a four dimensional continuum with a weight given by the action 5. To 
be concrete, from now on we consider (7), (8) with action (9). The perturbative 
treatment would start by neglecting the interaction term proportional to g. Then 
the integral is gaussian and can be carried out by mild generalization of finite 
dimensional gaussian integrals. One thus reproduces all results on the harmonic 
oscillators that make up free fields respectively noninteracting particles. Inter- 
actions are then incorporated as power corrections in In such calculations one 
encounters the well known divergences of quantum field theory. They already 
appeared in a trivial way in the zero point energy Eq in (1), which receives a 
finite amount of each degree of jfreedom, i.e. from each point in space. Note that 
this is not a problem deriving from infinite volume: the problem is ultraviolet, 
an infinite number of degrees of freedom per volume for a continuum. 

The standard procedure in quantum field theory is now to modify the theory 
in the small while maintaining the large scale features such that all divergences 
are regularized. With this modification a smallest length scale a is introduced, 
and only as a 0 the divergences reappear (cutoff or continuum limit). In the 
regularized theory one identifies relations between physical quantities, which re- 
main finite in the cutoff limit. These become the predictions of the renormalized 
theory. 

There are various possibilities of regularization. For perturbative calculations 
dimensional regularization is most popular since it is computationally efficient 
with a relatively small number of terms at intermediate stages. It can be de- 
fined, however, only at the level of Feynman diagrams and hence is strictly 
perturbative. For both rigorous analyses on the existence of field theories and 
for numerical computations independent of perturbative expansions, the lattice 
cutoff is most useful. It allows to regularize to an explicitly finite number of 
degrees of freedom, an obvious prerequisite to go to the computer. 
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The lattice cutoff is imposed by restricting the four dimensional space-time 
continuum to an embedded lattice. The type of lattice is expected to be irrelevant 
for renormalized predictions in the continuum limit. For technical simplicity 
one usually takes a simple cubic lattice, where the cartesian components are 
restricted to integer multiples of the lattice spacing a. Introducing a finite volume 
at the same time, usually of linear dimensions large compared to all scales in the 
problem, one may restrict x to = 0, a, 2a, . . . , (L — l)a. Boundary conditions 
are usually taken periodic, and one thus deals with a volume (La)^ and real 
field variables. 

The path integral naturally becomes an L^-fold ordinary integral 

Z = , (10) 

where it remains to write S as an ordinary function of the field values on the 
lattice sites. To this end some discretization is introduced of the same kind as 
for the approximation of differential equations on a grid. In the simplest case we 
take replace (9) by 

XI { 2 I ■ 

X K ‘ J 



with 

Af,cp(x) = -(<p(x + afi) - ip{x)) , (12) 

where /i is a unit vector along the /i’th coordinate axis. For a smooth field the 
above sum goes over into the integral at a rate proportional to a^. For the lattice, 
this is the sense in which the theory, symbolically given by the continuum path 
integral (7), is modified in the small only. The continuum limit a 0 of the 
integral (10), with its growing number of integrations, is a much more nontrivial 
question. 



2.3 Field Theory and Critical Phenomena 

As we whish to talk about a continuum limit of a lattice quantum field theory, 
the question arises as to which physical scale to compare a with. One could think 
of the only other dimensionful parameter m. This, however, is just a parameter in 
the action and does not have immediate physical relevance. A physical scale has 
to be derived from observables. It can be shown [2] that the exponential decay 
length of the two point correlation function can be related to the (renormalized) 
mass of particles tur by 

oc exp(-|xo - yo\mR) . 
xy 



(13) 
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The length l/rriR = is the distance, over which typical fluctuations of the 
field are correlated. The continuum limit is approached where the correlation 
length in lattice units ^ diverges. 

At this point the lattice formulation of quantum field theory makes contact 
with the field of critical phenomena. The partition function (10) can be regarded 
as a classical statistical system, which undergoes a second order phase transition 
at those parameter values m^g where ^ > oo. It is known that critical points 
possess universal properties like the critical exponents. These quantities depend 
only on the dimensionality of the lattice and on the number of field components 
and their symmetry structure. A closer investigation of the correspondence re- 
veals that the renormalized relations that remain finite and meaningful in the 
continuum limit are such universal properties of the associated phase transition. 
In other words, they are independent of the detailed structure of the lattice and 
of the precise choice of the discretization of the action. Obviously, this is a nec- 
essary requirement, since these features were arbitrary from the point of view of 
particle physics. 

The lattice formulation has shown, that quantum field theories of elementary 
particle physics can be mapped on certain four dimensional classical statistical 
systems. Their variables and symmetries are dictated by experimental observa- 
tions in particle physics. Universal properties of critical points in these systems 
correspond to physical predictions in field theory. 



2.4 Effective Field Theory 

After the discovery of renormalization in QED one first thought that physics is 
defined as a cutoff limit of regularized theories and that it is hence necessary to 
really take a — > 0 mathematically. But experience with other physical theories 
actually suggests to be more modest. Presumably, present day field theories are a 
correct description on today’s experimental energy scale and somewhat beyond, 
but not for arbitrarily high energies and short distances. As the Fermi theory 
of weak interactions was a low energy approximation to the electroweak sector 
of the standard model we expect the standard model to approximate something 
still unknown at some higher but finite energy. Then it is sufficient to take the 
cutoff energy 1/a to about that energy and not necessarily to infinity. 

Since the work of Wilson a picture has emerged that is visualized in Fig. 1. 
We plot the admittedly vague quantity 'physics’ versus a logarithmic scale. At a 
cutoff scale a, small compared to present day physics, we have a number of dif- 
ferent cutoff field theories. They can be given by different lattices, for instance. 
As one derives long distance physics predictions from them with appropriately 
adjusted parameters, they are able to give very similar predictions, which are 
hence independent of details of the cutoff as discussed before. Nature, coming 
from an unknown short distance theory, falls into the same narrow tube of al- 
most fixed long distance behavior. Our ability to make predictive theories at our 
present scale is based on the fact, that the possible long distance behavior seems 
to be severely restricted, once the right degrees of freedom and symmetries are 
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Fig. 1. Description of nature by effective field theories 



imposed. Our cutoff-mutilated theories are just some representatives in the same 
class as nature as far as energies far below the cutoff are concerned [5, 6]. 

Wilson has argued [7] that this situation is by no means unprecedented. The 
Navier-Stokes equations of hydrodynamics, for example, could be guessed on the 
basis of symmetry and some simplicity arguments. Only later was the underlying 
microscopic (quantum) theory found, which had this limiting behavior. Now 
some of the purely empirical parameters in the Navier-Stokes theory could be 
understood to arise from the underlying theory. We may well be in a similar 
situation with the standard model of particle physics. 

3 Stochastic Evaluation of Path Integrals 

After the previous motivation, in this section we discuss how to perform nonper- 
turbative numerical calculations in lattice field theory. This amounts to comput- 
ing numerically the integral in (10) whose dimension is large, for instance 10^. 
Standard methods of numerical integration with regular grids in each dimension 
are obviously not suitable. The equivalence with statistical physics comes to res- 
cue. The real positive weight containing the action exp(— 5) is interpreted as 
the Boltzmann factor of a classical statistical system with lattice field space as 
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its phase space. The analogy suggests that only a small part of the huge phase 
space, determined by the Boltzmann weight, is relevant for expectation values 
of observables of interest. The Monte Carlo method with importance sampling 
is a tool to sample this subspace. As it is a statistical method, the convergence 
in the form of the reduction of statistical errors at fixed parameters, will only 
be proportional to the square root of the invested computer time. 



3.1 Monte Carlo Method 

For the Monte Carlo [2, 8, 9] evaluation of path integrals like (10) one generates 
a sequence of lattice field configurations in the memory of a computer starting 
from an arbitrary 



(14) 

Here each — > corresponds to the modification (update) of the stored configura- 
tion by a certain algorithm. By methods to be discussed later it can be achieved 
that after a certain number of configurations, say n, affected by transient be- 
havior from the arbitrary start, the frequency of appearance of a any given (p 
in the sequence is proportional to the Boltzmann weight exp(— 5(«^)). Thus the 
dominant part of the integrand is incorporated into the sampling of the phas- 
espace which therefore is called importance sampling. The expectation value of 
any observable is then estimated by an average over a sufficiently long section 
of the sequence 

= ^ E (15) 

i=n+l 

The transition from p = to p' = is stochastic and hence com- 

pletely characterized by the transition probabilities T{p^p*). For the correct 
sampling with respect to the weight P{p) = exp(— 5((^))/Z the conditions of 
ergodicity and stability have to be fulfilled. Ergodicity amounts to 

T^{Pi ^0 > 0 ^ > ^0 ( 16 ) 

with some integer ko . This ensures that all phase space can be reached from any 
start p^^^ with finite probability in a finite number of steps. Stability amounts 
to the desired distribution P{p) being a fixed point under updating, 

j n p{‘p)T{f, ^') = py) ■ ( 17 ) 

It can be proven [9] that importance sampling follows from these properties. 

The efficiency of a valid Monte Carlo algorithm is an independent question. 
In general, sampled configurations are correlated with each other. These corre- 
lations decay exponentially with the separation in the sequence and possess a 
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characteristic decay time r in units update steps of the algorithm in use. Statis- 
tical errors are reduced most by independent estimates.^ More precisely, there 
is an error formula for estimates (15) of the following type, 



^ [E{F) - ^ = 



N/{2tf) 



(18) 



The numerator is the variance of F, and rp in the denominator is an effective 
autocorrelation time that incorporates the coupling of the observable F to the 
various (and not only the slowest) modes in the spectrum of T. We see that rp 
is the characteristic scale for the length of the simulation N and this is usually 
also true for the initial equilibration or thermalization part of the sequence that 
is discarded. Clearly, n, W >> tf is necessary. Apart from a few exceptionally 
favorable cases in certain spin models, all rp for known algorithms diverge in 
the continuum limit with characteristic dynamical exponents, rp oc A lot 
of research in the field of Monte Carlo is devoted to the search for algorithms 
with reduced r and z. 



3.2 Metropolis Algorithm for 

The local Metropolis algorithm is probably the most universal algorithm in lat- 
tice field theory. It is almost always available, but for many models it is inferior 
to other known schemes. Locality means that an update pass is composed of 
steps Tx for all lattice sites, where only the field at site x is modified and all 
other values are maintained. In the simplest case of one proposes a new value 

e [(p{x) - s,ip{x) + 5] , (19) 

where 5 is a width parameter to be determined by efficiency and the proposal is 
drawn from the interval with a flat distribution. Now one computes the action 
difference 

Zl5 = 5(y>') - S{ip) , (20) 

where ip' is the configuration with the new proposal at x. Of course, AS is only a 
local sum of terms around x where the configurations differ, with the remaining 
terms canceling. Now the proposal is accepted as the new configuration in the 
sequence with probability 



p == min(l,exp(— ^5)) . (21) 

A sequence of such steps for all sites (in some order) is called one sweep through 
the lattice. It fulfills the conditions of ergodicity and stability and is hence a 
legal Monte Carlo algorithm. The proposal width S has to be tuned such that 
the acceptance probability that is monitored is not pathological, that is neither 
close to 0 nor to 1. It is usually easy to achieve values between 0.3 and 0.7 which 
are close to optimal empirically. 



^ Barring the possibility of anticorrelations that usually cannot be achieved. 
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Dynamical exponents of z ^ 2 are found with this technique. For and 
many other cases including lattice gauge theory, one can substantially improve 
to z « 1 by mixing Metropolis with certain microcanonical steps. The latter 
fail to be ergodic by themselves, but the mixture is. This combination is called 
overrelaxation technique, and it represents the state of the art for pure lattice 
gauge theory. For more details, we refer to [10]. In Appendix A a subroutine 
package for a simulation of scalar theory is described. 

4 Summary 

In these lecture notes we tried to outline the role of field theory for particle 
physics, the lattice formulation of field theory and the numerical technique of 
Monte Carlo simulation for its solution beyond perturbation theory. Since this 
is a wide area, emphasis was first put on a few principles and the more concrete 
discussion was restricted to the simplest example of scalar field theory. Of 
course, more interesting with respect to the standard model is the inclusion of 
gauge fields. For this and other topics we refer to [2] for more introduction and 
to the proceedings of the annual lattice conferences, for instance [11], for infor- 
mation on the present day status of research. At the Heraeus school a discussion 
of the ongoing project of the computation of the strong coupling constant 
was given, for which we refer to the review [12]. 

To conclude, lattice field theory and numerical simulation continues to be 
a very active field. Somewhat later than anticipated when it all started about 
15 years ago, we are only now more and more coming to simulations that are 
relevant to experimental data. The computational problem is enormous, and it 
requires both, any amount of (parallel) computing power available together with 
a continued research on improving the efficiency of our methods. 

Appendix: FORTRAN Monte Carlo Package for 

Included with this volume is a collection of small FORTRAN 77 programs that 
allow the simulation of theory on a torus with the Metropolis algorithm as 
described in this article. 

In numerical context, a parameterization of the theory somewhat different 
from (11) is customary and advantageous. The action is written as 

S = + + + . ( 22 ) 

ar I // ) 

By comparing (up to irrelevant constants) with (11) one can see that both forms 
are equivalent if the following identifications are made: 

a(p = y/^(j) 

g = 2X1 

{amf = (1 - 2A)/ac - 2{D + 1) . 



(23) 

(24) 

(25) 
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The parameter D is the space dimension to be set to 3 in the physical case. In 
the package we allow for a general value to be set as a parameter. An unphysical 
value D = 1 is advisable for short experiments on workstations. With D = ^ 
one can even simulate the anharmonic oscillator. 

The new parameterization is in terms of dimensionless parameters with all 
lengths and energies expressed as multiples of powers of the lattice spacing a 
(lattice units). For A = ^ = 0 there is no interaction, and criticahty is reached 
for m = 0 or Av = 2 (^+ 1 )* ca,se, the theory exists only for k approaching 

the critical value from below. The renormalized mass tur is known exactly and 
is given in the main program for comparison with simulations. For A oo the 
integrations become concentrated at (^ = ±1, and the model becomes identical 
with the Ising model {2k is the usual /3). Here the critical k is known with high 
precision. For 0 < A < oo no exact solution is available, but the critical curve in 
(k:. A) is expected to smoothly interpolate. The region above the critical line is 
the so-called broken phase, where the symmetry 4> ^ is broken. It is more 
difficult to understand and simulate, and should be considered only later [2]. 

A very useful test on the correctness of the program is furnished by a non- 
trivial observable whose expectation value is always known exactly. By scaling 
all integration variables in the lattice path integral the following equipartition 
type identity is easy to derive 

1 = —4:K\'^^(l>{x)(j){x -h - 2(2A — 4- 4A^</)(x)^| . (26) 

The site x here is arbitrary due to translation invariance, and one may also sum 
over it to improve the statistics. 

In numerical experiments it is recommended to first study the free case A = 0 
and to increase k starting from small values to see the mass rriR in lattice units 
become small. Then one may slowly raise A. Note that the local Metropolis 
algorithm for (j)^ will not be efficient in the Ising limit. 

A package of routines necessary for some test simulations of the scalar theory 
is included with the proceedings of the Heraeus school. They are also available by 
anonymous ftp from linde . physik . hu~berlin . de in directory pub/heraeus. A 
README file and ample comments supply a further introduction to the programs. 
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Abstract. In this chapter I introduce a Monte Carlo method for the simulation of 
systems in continuum that uses global moves for the propagation in phase space. This 
is then applied to a new general model for the simulation of dense macromolecular 
systems. It consists basically of ellipsoidal-shaped units strung together to form chains, 
including branched and side chains. The ellipsoidal-shaped unit can vary in its principal 
axes and degenerate to a sphere, allowing for flexible modeling of the monomer units. 
I also touch on the parallelization of such systems. 



1 Introduction 

Conventional methods for the simulation of dense polymer systems have various 
shortcomings. Among the most severe is that not all methods generate con- 
figurations from a canonical ensemble. A further shortcoming of the methods 
employing the integration of the equations of motion, in one form or another, is 
the dependence of the computed observables on the time-step size. In principle 
an extrapolation to vanishing step size is necessary to obtain an unbiased result. 

While Monte Carlo methods do not have the shortcomings listed above they 
lack the global updating entailed by the integration of motion. To obtain reason- 
able acceptance rates, the attempted changes in the configuration (moves) are 
in general local. A segment of a chain is displaced and the new position either 
rejected or accepted. 

The hybrid Monte Carlo method (HMC) is a step towards the development of 
simulation methods that are exact, in the sense that observables do not depend 
on step size, that the methods yield the correct statistical-mechanical ensemble 
and they change a configuration globaly. 

2 Brief Review of the Simulation Method 

In conventional Monte Carlo (MC) calculations of condensed matter systems 
such as an W-particle system with a Hamiltonian 'K — U where U denotes 
the potential energy, only local moves (displacement of a single particle) are 
made [1, 2, 3]. Updating more than one particle typically results in a prohibitively 
low average acceptance probability (Pa)- This implies large relaxation times 
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and high autocorrelations especially for macromolecular systems. In a Molecular 
Dynamics (MD) simulation, with H = T +ZY, on the other hand, global moves are 
made. The MD scheme, however, is prone to errors and instabilities due to the 
finite time-step size. In order to introduce temperature into the microcanonical 
context, isokinetic MD schemes are often used [3]. However, they do not yield 
the canonical probability distribution, unhke Monte Carlo calculations. 

The hybrid Monte Carlo (HMC) method [4, 5, 6] combines the advantages of 
molecular dynamics and Monte Carlo methods: it allows for global moves (which 
essentially consist in integrating the system through phase space); HMC is an 
exact method, i.e., the ensemble averages do not depend on the step size chosen, 
algorithms derived from the method do not suflFer from numerical instabilities 
due to finite step size as MD algorithms do, and temperature is incorporated in 
the correct statistical mechanical sense. 

The application of the hybrid Monte Carlo method has been proposed [5] for 
condensed-matter systems and investigated for atomic fluids. In this chapter the 
method will be described briefly and applied to macromolecular systems. 

In an HMC scheme global moves can be made while keeping the average 
acceptance probability (Pa) for a move high. This can be achieved as follows. 
One global move in configuration space consists in integrating the system through 
phase space for a fixed time t using some discretization scheme {6t denotes 
the step size) 



gdt . j^6AT jffN 

(x,p) — >• g^\x,p) =: {x',p') 

of Hamilton’s equations 



dx 



ft = • (') 

At the beginning of each global Monte Carlo step the initial momenta are drawn 
from a Gaussian distribution at inverse temperature /3, 



PGaussian(p) OC Q ^ , 

and it can be shown [5] that the acceptance probability is then 

9^\x,p)] = , 

where 

Sn = 'H[g^\x,p)] - 'H{x,p) 



( 2 ) 

( 3 ) 



is the discretization error associated with the discretization scheme g^^. Provided 
the discretization scheme is time reversible and area preserving detailed balance 
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is satisfied [5]. Thus the HMC algorithm generates a Markov chain with a Boltz- 
mann distribution as the stationary probability distribution. The probability 
distribution is entirely determined by the detailed balance condition, therefore 
neither the distribution nor any ensemble averages depend on the step size 5t 
chosen. 

However, the average acceptance probability (Pa), because of (3), depends 
on the average discretization error (STi) and hence does depend on St. Increasing 
the step size will result in a lower average acceptance probability (Pa)- Varying 
Stj the average acceptance probability (Pa) can thus be adjusted to minimize 
autocorrelations of observables. 

The Metropolis transition probability is really composed of two probabilities 
[4]: 



PM[(a:,p) ->• = PcPa , 

i.e., a propositional probability for a configuration and one for the acceptance. As 
long as we accept the configuration with the Hamiltonian H we can use any other 
Hamiltonian TV to derive equations of motion for the proposition. Specifically, we 
can use V! = oH. This choice is motivated by the observation [8] that the effect 
of the discretization of the original Hamiltonian on the equations of motion, is to 
renormalize the original Hamiltonian. Thus, instead of the original Hamiltonian 
a new scaled one is solved exactly. This observation can be used to scale out to 
a certain degree the effect of the discretization error on the acceptance rate and 
accelerate the algorithm. 

3 Modeling of Polymer Systems 

The simulation of dense macromolecular systems is virtually impossible if one 
takes into account all degrees of freedom and interactions of a chemically real- 
istic chain [10, 11, 12]. For a single chain one may very well use a chemically 
realistic description. For long and many chains the computational complexity is 
overwhelming. Not even the fastest supercomputers on the horizon or beyond 
will be able to deliver enough computational power to deal with a dense system 
with all chemical detail present. It is therefore imperative to reduce the com- 
plexity in order to make the simulation a tractable approach. This reduction of 
the complexity of the model is called the coarse-graining of the model. In the 
coarse-grained approach, the detailed chemistry enters only in the derivation 
of the potential between new interacting units. These are substitutes for the 
original detailed chemistry. The system is considered on mesoscopic scales. 

A step into this direction is the parametrization of both the intra- and in- 
termolecular interactions by pair potentials. This is an approximation starting 
from an ab initio quantum mechanical calculation with full chemical detail. The 
ansatz of united atoms to reduce the number of atoms and thus reduce the 
complexity of the calculation is also a step in this direction [24]. Formally, the 
unification of, for example the two H atoms with a C atom for a polyethylene 
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represents a coarse-graining. This has two facets. On the one hand, the reduction 
in the number of elements decreases the computational complexity. This reduc- 
tion leads to a possible longer observation time. The reduction in the complexity 
also allows an increase in the linear dimension of the system. 

On the other hand, coarse-graining eliminates out those degrees of freedom 
that enter into macroscopic properties only through their cooperative effect and 
substitutes them for effective degrees of freedom. Both facets make feasible the 
simulation of complex macromolecular systems to predict macroscopic proper- 
ties. 

To discretize the molecular system altogether is another possibility [13]. The 
continuum chains are mapped to chains on a lattice. This step alone reduces 
the computational complexity. To further reduce the degrees of freedom one 
can perform a coarse-graining of a chain. This approach has been successful in 
treating poly-carbonates [14, 15, 16, 17, 18]. 

The coarse-graining ansatz [14, 15], in our view, is the only route to simula- 
tion of macroscopic properties of polymer materials. It combines the requirement 
of a simplified model that one can handle computationally with the requirement 
of representing the degrees of freedom that are necessary for the macroscopic 
properties one seeks to predict. 

Playing the same theme, I describe here a model in which those degrees of 
freedom that do not enter, or only through their cooperative effect, into macro- 
scopic properties have been eliminated. The new component in the model is the 
topology, which results in a special geometry. Whereas the usual approach in the 
united-atom or coarse-grained models is to take the building blocks as spherical 
symmetric atoms, here ellipsoidal units give rise to nonspherical interaction [25]. 
Using this building block, chains can be constructed of various connectivities. 
The model allows the use of spherical and ellipsoidal “atoms” or building blocks 
to form a chain. Thus the model is able to accommodate and model even com- 
plex and asymmetric monomer units in a rather simple way. An example for 
such a model is given in [23] . 

4 Coarse- Graining 

The starting point of the model is the coarse-graining ansatz as developed by 
Paul et al, [14, 15, 22]. To present the model along with an application to a 
specific material, I describe the approach for Bisphenol-A-Polycarbonate (BPA- 
PC) and a variation of the polycarbonate. 

In the coarse-graining approach we want to map larger units of the realistic 
chain onto one or more units of a new chain such that the interactions between 
the new units reflect and mimic those of the chemically realistic chain. The units 
we map can be the repeating unit of the chain, or parts of the repeating unit. 
The decision on the size or which atoms take part in one unit can be based on 
the coherence and persistence length of the chain. 

Consider three atoms bonded to form a simple chain. We want to keep, for 
example, the average end-to-end distance or the radius of gyration invariant. The 
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possible variation in the monomer lengths and the monomer angles between the 
three atoms can be well represented by just two atoms with rescaled monomer 
length and monomer angle potentials if we also rescale the zero-temperature 
monomer length and monomer angle. 

Once the length scale is fixed, atoms of the unit are taken as base points. 
Prom the possible conformations of the chain we find the distributions for the 
monomer lengths and monomer angles between the base points along the chemi- 
cally realistic chains. The distribution contains information on the local structure 
on scales smaller then the fixed length scale. 

To develop a coarse-grained model monomer (in our case for BPA-PC) we 
proceed in three steps. As input we use the results of ab initio calculations of 
the geometry and the torsional potentials. Furthermore, we use monomer length 
and monomer angle distributions that have been determined via Monte Carlo 
simulations from the ab initio results [23]. 

1. We compute discrete potentials around one and two atomistic monomer 
units. 

2. From the distributions mentioned above we determine the bonded coarse- 
grained interactions. 

3. Prom the results of item 1. we determine the non bonded coarse-grained 
interaction constants. 

One caveat to this approach must be made here. The chains generated by the 
Monte-Carlo procedure are in vacuum. The intermolecular interaction and its 
effect on the chain conformations is not taken into account. Only intramolecular 
interactions are used for the generation of the conformations so far. In principle 
an intermolecular interaction can be worked into the generation of the confor- 
mations. 

The ab initio calculations mentioned above showed us that in the case of 
Bisphenol-A-Poly carbonate, which is considered here, the torsional potential is 
negligibly small. Thus we introduce no torsional interaction. In general, torsional 
interaction must and can be included. 

5 The Monomer Unit 

Our approach starts from the general ellipsoid model for the description of poly- 
mer chains [25]. In this model the basic building block is an ellipsoid. Up to 
now our monomers show rotational symmetry along the backbone. Symmetry- 
breaking side groups can be modeled by sticking degenerate ellipsoids (spheres) 
to the main ellipsoids. Of course, the ellipsoid itself need not be rotationally 
symmetric. We can also use oblate ellipsoids and in general ellipsoids with all 
three principal axes different. This, however, makes the calculation of the non 
bonded interaction more difficult. 

An ellipsoid can easily be adapted to different monomer structures. We only 
have to change the half axes and radii according to the size of the chemically 
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realistic monomers. In the simplest form we have no side groups at all and the 
chain consists of rotational ellipsoids whose longer half axes can assume values 
according a given monomer-length distribution. The position of the monomers 
along the same chain is furthermore determined by the monomer angles. The 
short half axes are chosen so that the volume of the ellipsoids corresponds, in 
the case of BPA-PC, to the volume of the chemical monomer unit. 



6 Bonded Interactions for BPA-PC 

The bonded potentials for the lengths of the monomer units and the angle be- 
tween two of them are obtained in our approach from a coarse-graining proce- 
dure [22]. No torsion potential is needed as the distribution of torsion angles is 
almost uniform. The result of the coarse-graining is a distribution of lengths and 
angles between the new building blocks. The input for the determination of the 
coupling is the distribution of lengths (angles) of chemically detailed monomer 
units [23] . This distribution is obtained from a generation of chains with the de- 
tailed chemistry and interaction using Monte-Carlo simulations and shows two 
clear peaks. It must be kept in mind, however, that the distribution does not 
contain effects from the packing and intermolecular interaction in a dense sys- 
tem. Consistent with this we may view these interactions as given by a sum of 
gaussians. The correlations between monomer length and angle and intermolecu- 
lar interaction are neglected and the coupling constants simply determined from 
the second moments of the distributions. 

In our case of BPA-Polycarbonate we used an average distribution for the 
backbone atoms of the carbonate-group, i.e. the center of mass of the atoms 
labeled Oi , Ci and O 2 of one monomeric unit and the corresponding center of 
mass of the succeeding monomeric unit (see Fig. 1). The carbonate groups may 
be regarded as joints along the polymer chain. 

Performing a simultaneous fit of two gaussians to the monomer-length dis- 
tribution as described in [23], we obtain the four parameters 

(^Oi) and («^i) (4) 

{I 02 ) and (5) 

representing the two average monomer lengths ((/oi) and (/ 02 )) and the two 
variances ((^ 01 ) (^ 02 )) characterizing the width of the distributions. For 

simplification, no crossing of monomers between the two distributions is per- 
mitted. The distributions are temperature dependent and the fitting must be 
carried out for each simulation temperature independently. 

Along the same line the parameters for the monomer-angle distribution 

(0o) and (el) (6) 

are obtained from the fit of a single gaussian [23]. 




264 Dieter W. Heermann 




Fig. 1. Position of atoms in the BPA-PC monomer. Two adjacent monomers are shown 



These parameters must be fitted to the parameters of the model Hamiltonian 

"^bond ~ ^ ^ ^01 ) "h 

i 

\k2Y,{l-Si){k-lo2f+ (7) 

i 

^kg ^(cOS0i - COS^o)^ 
i 

where Si is 1 for monomers belonging to the first distribution (/qi) and 0 for 
monomers of the second distribution (/q 2 )- Analysis of cross-correlations be- 
tween monomer-length and monomer-angle distributions reveals that they can 
be neglected. Therefore as a first approximation we identify {Iqi) with loi and 
kBT/ {ll,) with ki. 

This form of interaction has already been used in other models for Polyethy- 
lene [19] and BPA-Polycarbonate [14, 15, 16, 17, 18]. An a posteriori test of the 
distribution using simulated systems confirms the correctness of this ansatz. The 
simulations led to distributions that indeed showed the skewed form of the ab 
initio distribution. 



7 Parallelization of the Polymer System 

The molecular dynamics part for the method outlined above is fairly general 
and can be considered on general grounds, except that we are dealing we long- 
chain molecules. If we were to consider only atoms, then we could apply the 
domain decomposition concept. For polymer systems, if we are to keep a single 
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Fig. 2. Shown is a schematic picture for the parallelization of the polymer system. In 
this implementation polymers are not the units that are passed but atoms 



chain as the “unit” then the computational box can not be partitioned into 
many subvolumes. The chain may stretch over many such subvolumes. From 
this point of view it is not advantageous to use a data structure and integration 
algorithms that work on the basis of the connectivity knowledge. Rather we 
should have the atom as the basic data structure. Then we could also apply the 
ideas developed for the domain decomposition to the polymer system. The price 
paid here is a higher administrative overhead. The algorithm can not be based 
on the connectivity and every “atomic data structure” must carry information 
on the chain connectivity. What is passed around is this data structure with the 
accumulated results for the force on that particular “atom”. This also implies 
that there must be some instance that ensures that after a certain number of 
steps all information was gathered to guarantee a position update in terms of 
an integration step. 

This approach also facilitates the application of the program to many differ- 
ent types of polymer systems. Since there is no reference to the chain connectivity 
with the program (all information of this nature is hidden in the program and 
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build into the data structure) one is able to simulate systems ranging from simple 
linear chains to branches or even networks. 
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Abstract. The basic features of computer simulations for fluids are presented based 
on the molecular dynamics approach. Technical details, in particular the interaction 
potentials and the scaling of physical variables, the application of constraints to the 
dynamics, e.g., in order to simulate a thermostat, and the choice of integrators for the 
numerical solution of the equations of motion are discussed. Some examples which can 
be used as numerical exercises are outlined. The method of nonequilibrium molecular 
dynamics (NEMD) is introduced. Simulations of relaxation phenomena are mentioned 
briefly. The main emphasis is on simulations of a plane Couette flow as an example of a 
stationary transport process. Procedures to extract rheological properties, such as the 
(non-Newtonian) viscosity, normal pressure differences, and information on shear-flow- 
induced structural changes, are given, firstly for fluids composed of spherical particles. 
This comprises simple liquids and dense colloidal dispersions, in which the states far 
away from equilibrium studied in NEMD are accessible in experiments. Secondly, sim- 
ulations for complex fluids, in particular polymeric melts, nematic and smectic liquid 
crystals, as well as ferrofluids and magnetorheological fluids are discussed. 



1 Introduction 

Transport and relaxation processes in fluids have been studied by nonequilibrium 
molecular-dynamics (NEMD) computer simulations for more than twenty years 
[1, 2]. By now the method is well established [3]-[12]. Physical phenomena in 
simple fluids far away from equilibrium and the material properties of complex 
fluids have been analyzed in recent years. The NEMD results provide information 
and insight into microscopic mechanisms of nonequilibrium phenomena similar 
to that available in real experiments, e.g. in light, x-ray or neutron scattering. 
However, even more detailed information is available in molecular dynamics, 
e.g. snapshots showing the positions of the particles or specific contributions to 
physical quantities of which only the sum can be measured in the real experiment. 
Thus computer simulations can often yield sharper tests of the assumptions 
underlying conventional theories than a real experiment where, in addition, some 
uncertainty regarding the particle-particle interaction always remains. 

The maintenance of a stationary nonequilibrium state and, in particular, of 
a constant kinetic energy required to simulate a thermostat impose constraints 
on the dynamics. After some general remarks on molecular dynamics, potentials 
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and the scaling of physical variables, methods for simulating a thermostat are 
presented. Some other technical details, including the use of various integrators 
for the numerical solution of the equations of motion are discussed. 

Relaxation phenomena can be observed when a system is prepared in a 
nonequilibrium state and then allowed to relax to equilibrium. Examples are 
the decay of an initially crystalline positional order into a fluid state and stress 
relaxation after a sudden change of shape of the volume containing the parti- 
cles. As an example of a stationary transport process a plane Couette flow, also 
referred to as “simple shear flow” , is considered in some detail. Results are pre- 
sented for “simple” and for “complex” fluids. The method of NEMD simulations 
is firstly discussed for fluids composed of spherical particles. The flow behavior 
of the so-called simple fluids and the shear-induced structural changes are not 
simple. A comparison with experimental results of (dense) colloidal dispersions 
of spherical particles can be made. The complex fluids studied are polymeric 
melts and anisotropic fluids such as nematic liquid crystals (LC) and ferrofluids, 
magneto- or electrorheological (MR or ER) fluids. 



2 Basics of Molecular Dynamics 



2.1 Equations of Motion 

In a molecular dynamics computer simulation for a substance composed of N 
spherical particles Newtons equations of motion 



d2 



m 



df2 ^ ^ ’ 



( 1 ) 



are integrated numerically. The particles are located at positions r* in a volume 
V. The particle density is n = N/V. The particle i, {i = 1,2, N), feels the 
force F^ = which is the sum of the forces F^-^ exerted by all other particles 

j i on particle i. When one wants to avoid surface effects, periodic boundary 
conditions and the nearest-image convention are used. This means, particle i 
either feels the force caused by particle j or by one of its images depending on 
which one is closest to it. 

The temperature T of the system is linked with the part of the kinetic energy 
K which is not associated with a macroscopic motion: 

= ( 2 ) 



where ks is the Boltzmann constant and is the “peculiar velocity”, i.e., the 
velocity of a particle relative to the flow velocity v = v(r^). The center of mass 
of the system of N particles is constant. Thus the number of degrees of freedom 
in three dimensions is 3{N — 1) rather than 3iV. 
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In order to simulate an isothermal system the temperature has to be kept 
constant. The simplest version of a “thermostat” consists of rescaling the pe- 
culiar velocity after each time step by the factor (TWanted/Tmeasured)^^^- Other 
thermostats, e.g., those referred to as “Gaussian” and “Nose-Hoover” [7] are 
imposed as contraints and will be discussed in one of the following sections. 

2.2 Extraction of Data from MD Simulations 

The observables of interest, such as the internal energy and the components of 
the pressure or the stress tensor, can be calculated from the known positions and 
velocities of the particles as time averages according to the rules of statistical 
physics. Typically, in a stationary state, the required data are extracted after 
each tenth to hundredth time step. Similarly, more detailed information can be 
obtained from the simulation such as the velocity distribution function, the pair 
correlation function or the static structure factor, which can also be measured 
in scattering experiments. 

In ordinary molecular dynamics (MD) simulations data are extracted when 
an equilibrium state with a specified density n (or pressure) and temperature 
T has been reached. In addition, dynamic quantities can be extracted from the 
temporal fluctuations, e.g., transport coefficients can be calculated from time 
correlation functions with the help of Green-Kubo relations. In nonequilibrium 
molecular dynamics (NEMD) simulations, on the other hand, relaxation and 
transport phenomena are investigated more directly and in close analogy to real 
experiments. States far away from equilibrium are also studied. In the following, 
some relaxation phenomena are mentioned and a stationary plane Couette flow 
is considered in more detail as a special case of a transport process. 

Before NEMD simulations are discussed further, however, the presentation 
of some elementary material on potentials from which the forces are derived, on 
constraints and integrators is appropriate. Characteristic features of the dynam- 
ics can be studied in simple exercises involving one or two particles in one or 
two dimensions (without periodic boundary conditions). 

3 Potentials, Constraints, and Integrators 

3.1 Interaction Potential and Scaling 

In the simulations dimensionless or “scaled” variables are used which are denoted 
by the same symbols as the physical variables when no danger of confusion exists. 
For a system of particles whose forces are derived from a binary interaction 
potential, the Lennard-Jones (LJ) potential is 

# = ^ (3) 

Lengths and energies are presented in units of the diameter vq and of the poten- 
tial depth ^ 0 - The units used for the particle density and for the temperature are 
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Tq ^ and ^ 0 - The time is scaled with the reference time Iq = m 

is the mass of a particle. The pressure, the shear rate and the viscosity of the LJ 
fluid are expressed in units of ^ = Tq For many 

fluids composed of atoms or small molecules, the specific values of ro and #o that 
are needed to relate the theoretical results to the thermophysical properties of 
specific substances are available. In the simulations, the cutoff of the interaction 
at a finite distance Tc is often achieved just by putting the potential and the 
force equal to zero for r > rc, e.g., with Vc = 2.5ro* A smoother cutoff whereby 
the force is (at least) continuous at however, is preferrable for the integration 
of the equations of motion. The “LJ-spHne” potential of Evans and Holian 

is an example of a modified LJ potential with an improved cutoff. This potential 
is equal to the LJ potential for r < n, where n = (26/7 ) ^ 1.244ro is 
determined by the point of inflection, #"(ri) = 0 , and by the third-order spline 
expression, 



(Zl _!£)' + (iL _ l£)\ < r < re 

V^o roj \ro roj \7 J 

(4) 

and, of course, = 0 for r > Tc. The coefficients § 2,53 and rc, determined 

such that the interaction potential as well as its first and second derivatives are 
continuous at rj, are given by 



S2 = - 



24192 

3211 



S 3 = - 



387072 

61009 






n« 1.737ro. (5) 



An even smoother cutoff where not only the force but also its first derivative 
vanish continuously at rc is the potential ^LJsmoc ^hi^h, between rj and Tq is 
given by 



#LJsmoc _ 




and, of course, also ^LJsmoc _ q j* > rc. Here one has 



n < r < rc 



( 6 ) 



133 





1.901 ro 



(7) 



A simpler short-range potential with a smooth cutoff and parameters chosen 
such that the force at r = ro and the well depth are equal to the corresponding 
values of the LJ potential is 



#shrat — 24^0 f 1 ~ — 

V ’•o 



Tc-r 
Tc -Tq 



3 



113 

r <Tc = — ro 



1,395 ro 



( 8 ) 



and, of course, = 0 for r > rc. The minimum occurs at r = r^in '= 

(89/81)ro 1.099ro, which is somewhat smaller than the corresponding value 
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1 1.2 1.4 1.6 1.8 2 



Fig. 1. Potentials # (in units of #o) as functions of the distance r (in units of ro). The 
Lennard-Jones (LJ) potential which extends farthest to the right is compared with the 
two smoothly cutoff versions “LJspline” {second from left) and “LJsmoc” {second from 
right). The thick curve is the short-range attractive “shrat” potential function 



2^/®ro 1.1225 ro of the LJ potential. The LJ potential and the three other 

potential functions with a finite cutoff discussed here are displayed in Fig.l. 

When only the repulsive “r“^^”part of the LJ interaction potential is taken 
into account one speaks of a “soft spheres” (SS) potential. The LJ-potential 
cutoff at its minimum = 2^/® « 1.123, which is also purely repulsive, is 
referred to as WCA potential. An interaction potential which, at r = ro, has the 
same value and force as the WCA potential can be introduced in analogy to (8): 

^shrep _ rf ^ r<rc = lro = 1.125ro, (9) 

ro(rc-ror 8 

and = 0 for r > rc. The simple and smoothly cut off potentials (8) and (9) 

are recommended for test simulations with typical values of the kinetic energy 
per particle ^ RbT smaller than about 5^o- 



3.2 Thermostats 

Gaussian Thermostat. A system with a vanishing average flow velocity, i.e., 
with V = 0 is considered now; then ^ rL The condition that the kinetic 
energy K = (m/2) • c* be constant (isokinetic system) is a nonholonomic 

constraint which can be taken into account by supplementing the equations of 
motion for particle i with a constraint force Z*: 



m — r^ = + Z\ 



( 10 ) 
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Gauss’s principle which states that 7? • Z* is extremal, hopefully minimal, 
leads to ~ c\ The proportionality coefficient is put equal to — where 
the pseudo-friction coefficient C can be positive or negative, depending on the 
dynamic state of the system. More specifically, the Gaussian thermostat consists 
of replacing the equations of motion by: 

m^r* = r-mCcS = (11) 



with 



C = Cg := 



m J2k c* • c** 



( 12 ) 



The equations of motion remain time-reversal invariant provided that the force 
F‘ guarantees this property. Notice that 



2KCg = 



d£;pot 

dt 



(13) 



where is the potential energy. The Gaussian thermostat keeps the kinetic 
energy constant. Its desired value Kq has to be initially assigned, e.g, by rescaling 
the initial velocities. 



Velocity Rescaling. The frequently used method of rescaling the magnitude of 
the velocities after each integration time step is equivalent to the Gaussian ther- 
mostat. To show this, note that rescaling by the factor (Twanted/Tmeasured)^^^ = 
where Kq and K are the wanted and the measured values of the 
kinetic energy, is equivalent to replacing, after each time step of length St, the 
velocity c* by c* + Sc^ with 




(14) 

Here, SK = K — Kq is the change of the kinetic energy over one time step and 
\SK\ <C 1 is assumed. The last equality in (14) follows from the fact that the 
total energy K + remains constant, hence SK = —SE^^^, and dE^^^/dt = 
SE^^^ /St was used. For small enough time steps St, the Gaussian constraint force 
occurring in (11) implies an extra change of the velocity given by (14). Thus the 
velocity rescaling is equivalent to the action of a Gaussian thermostat. 



Nose-Hoover Thermostat. This thermostat consists of supplementing the 
equations of motion (11) with an equation of change for the coefficient (: 



dt 



1 

7-2 

^NH 



Ko 



1 



(15) 



As before, K = {m/2) -c^ and Kq are the actual and the prescibed values 

of the kinetic energy. Furthermore, tnh is a relaxation time determining the 
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speed of response of the thermostat. It is an extra parameter occurring in the 
simulation. Of course stationary and, in particular, equilibrium properties should 
not be affected by the choice of On the other hand, dynamic phenomena 
occurring on a time scale comparable with tnh are modified by the action of the 
thermostat. Again, time reversal invariance is not destroyed by the thermostat. 
The limiting cases tnh oo, C = 0 and tnh = C = Cg correspond to 
isoenergetic (adiabatic) and Gaussian isothermal simulations, respectively. 







Fig. 2. The orbits x2 vs xl and the velocity orbits c2 vs cl for unconstrained {top) 
and Gaussian iso-kinetic {bottom) motions 



Thermostats in Action. Next, as an exercise, the effects of the thermostats 
are studied by comparing contrained and unconstrained solutions for a simple 
two-dimensional one-particle problem. More specifically, a particle with mass m 
is subjected to an external force determined by the short-range attractive (shrat) 
potential (8). The cartesian components of the position vector and of the velocity 
are denoted by xl, x2 and cl, c2. In Fig.2, the orbits x2 versus xl and the velocity 
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orbits c2 versus cl, as calculated by ND Solve of Mathematical are displayed for 
the unconstrained system and for the case of a Gaussian thermostat. The inititial 
conditions are, in terms of reduced, dimensionless variables: xl = 1.25, x2 = 0, 
cl = 0, c2 = 1, in both cases. The curves are shown for the reduced times 
t with 0 < t < 8. In the absence of the constraint, the particle gains kinetic 
energy after the start. This is not allowed when the Gaussian thermostat is in 
action. For this reason, the particle does not travel as far as in the unconstrained 
case. The same holds true for the Nose-Hoover thermostat (Fig.3), where the 
corresponding orbits are presented for tnh = 0.1 (in reduced units). In this 
case, the instantaneous kinetic energy still varies and the velocity orbit shows a 
behavior intermediate between the cases compared in Fig. 2. 





Fig. 3. The orbits x2 vs xl and the velocity orbits c2 vs cl for a Nose-Hoover ther- 
mostat 



The time dependence of the pseudo-friction coefficient ^ corresponding to the 
Gaussian dynamics studied in the lower part of Fig. 2 and the Nose-Hoover ther- 
mostat, cf. Fig.3, is shown in Fig.4. Note that ( assumes positive and negative 
values of equal magnitude. The total energy E varies according to ^ = — 2^^. 
For comparison, in the case of an unconstrained motion, i.e., for C = 0 corre- 
sponding to the top orbits of Fig. 2, the maximum deviation of the energy from 
its initial value that is caused by inaccuracies of the numerical integration is less 
than 7 x 10~^, in reduced units. 

Other Constraints. Simulations in which the pressure or a specific component 
of the pressure tensor are constant can and have been performed by imposing 
an appropriate constraint on the dynamics of the N particle system. Various 
procedures analogous to the “rescaling” , Gaussian and Nose-Hoover methods of 
thermostats were applied [7]- [9]. 

Different types of constraints are encountered when one wants to model 
molecules composed of atoms. Simple examples are dumbbells and chain 
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t t 

Fig. 4. The pseudo-friction coefficient ^ (in reduced units) as function of the (dimen- 
sionless) time t for the Gaussian {left) and the Nose-Hoover {right) thermostats 



molecules. When the binding forces are derived from an extra potential, the 
high frequency oscillations of the bonds may require shorter time steps. This is 
avoided by the use of scleronomic constraints which keep the bond lengths con- 
stant. For the dumbbell, the constraint force is usually calculated in text books 
of Mechanics starting from the d’Alembert principle. Application of the Gauss 
principle to this problem yields the same result. An example for the simulation 
of chain molecules in a polymeric melt is presented later. 



3.3 Integrators 

Runge-Kutta and Predictor- Corrector Methods. The classical (second 
and higher order) Runge-Kutta methods used to integrate ordinary differen- 
tial equations require more than one calculation of the forces per time step. 
For this reason, predictor-corrector methods which need only one of the time 
consuming force evaluations per time step became rather popular in MD and 
NEMD simulations [4, 16]. On the other hand, simple integrators referred to as 
“Stoermer-Verlet” and “velocity Verlet” methods used in the early days of MD 
simulations are still applied today. These and some “symplectic” integrators will 
be discussed in the following sections. 



“Stoermer— Verlet” and “Velocity- Verlet” Methods. Consider the second 
order differential equation 

( 16 ) 

The Stoermer-Verlet method is based on approximating the second order time 
derivative by {x{t + 5t) - 2x{t) + x{t - 5t)] / {dt^ where St is the integration 
time step. The ’’force” F on the r.h.s. of (16) is taken as F{x{t)). With the 
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identifications a^new = ^ Xoid = the integrator 

corresponds to 

Xnew = 2x- Xold + F{x) (St)^ . (17) 

Since x and the velocity v rather than x and Xoid are given initially, Xoid has 
to be calculated, e.g., by Xoid = x — vSt -j- 0.5F(x)(St)^ before (17) can be 
applied for a sequence of time steps. Then velocity t;, to be approximated by 
V = {xnew — ^oid) /(2^t) is not involved in the integration procedure and thus 
can not be controlled by rescaling. The velocity- Verlet method, on the other 
hand, does involve the velocity. This procedure is based on 

Xnew = X -hv St Q.bF{x) (St)^ , Unew = + 0.5 (F{x) + F{Xnew)) • (18) 

In Fortran notation, (18) is equivalent to the three consecutive statements (“leap- 
frog” scheme): 

V = V + 0.5 F{x) St , X = X V St -h 0.5 F{x) (St)^ , v = v 0.5 F{x) St. (19) 



Symplectic Integrators. The Hamilton equations for the coordinates q and 
momenta p are: 



^ ^ ^ _ dH 

dt dp ’ dt dq 



( 20 ) 



Since dG/dq-\-dF/dp = 0, the phase-space volume is conserved. The symplectic 
integrators guarantee this property. When constraints, for example, imposed by 
a stationary transport process lead to a decrease of the phase-space volume, one 
is sure that the integrator as such does not contribute to it. In the following, the 
special case G = p, F = F{q) is considered. A sympletic integrator of order M 
determines the new values of q and p after one time step St by a sequence of M 
consecutive changes Sq and Sp according to: 



Sp = bi F St , Sq = cipSt , Sp = b 2 FSt , ..., Sq = CMpSt . (21) 

The coefficients bi and Ci have the property Yli h = Ci = 1 and are 

chosen to optimize the conservation of energy. The simple case 6i = 0, 62 = 1, 
Cl = C2 = 0.5 corresponding to the scheme 

Sq = 0.5pSt, Sp = F St , Sq = 0.5pSt, (22) 

is called “si2.a” in [17]. It is essentially equivalent to the Verlet integrator. An 
example of a third order integrator, denoted by “siS.b” [17], involves the coeffi- 
cients: 



bi = 0.2683301 = 03, 62 = -0.1879916 = C2 , 63 = 0.9196615 = ci . (23) 
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Notice that some coefficients are negative. A fifth-order method, termed “si4.c” 
[17], uses the coefficients 

bi = 0.06175885813563, ci = 0.20517766154229, 

62 = 0.33897802655364, C2 = 0.40302128160421, 

63 = 0.61479130717558, C3 = -0.12092087633891, (24) 

64 = -0.14054801465937, C4 = 0.51272193319241, 

65 = 0.12501982279453, cg = 0. 

The difference d£Jmax = £^max — ^min between the maximum and minimum 
values £^max — Emin of the energy during an integration interval 0 < t < ^end can 
be used as a measure for the efficiency of an integrator. When comparing different 
integrators the number of force evaluations which is the main computational load 
in MD simulation, should be equal. For this reason, one may introduce a force 
time step d^o and put St = Ldto where L is the number of force evaluations 
needed per integration time step St. For the si2.a, as well as for the Verlet 
integrators, one has L = 1, whereas L = 3 and L = 4 for the si3.b and si4.c 
integrators. As an exercise, it is suggested that one determines djFmax for the 
harmonic oscillator with the simple Hamiltomian H = 0.5(p^ + g^) integrated 
over one period corresponding to tend = 27 t, as function of dto, and compare 
the various integrators discussed here. For a slightly more complicated problem, 
viz. the one-dimensional motion of a particle with mass m = 1 in the short- 
range attractive shrat potential (8), such a comparison is presented in Fig.5. 
The initial position and velocity, expressed in variables analogous to those of 
the problem studied in previous numerical example, is xO = a/6, cO = —1. The 
particle starts outside the interaction region, flies towards the force center where 
it is first attracted, then feels the repulsion, is reflected and, at the time tend = 1 
is at a position close to where it started. This corresponds to a head-on collision. 
The curves displayed from top to bottom, are for the si2.a, si3.b schemes, a fifth 
order integrator proposed in [18], and the si4.c integrator. Notice that dJ^max is 
plotted versus 1/dto and logarithmic scales are used; the value dto = 0.0025, for 
example, corresponds to about 2.6 on the horizontal axis. Small force-time steps 
leading to smaller values for the energy inaccuracy are on the right. Clearly, the 
use of better integrators can improve the accuracy at no extra computational 
cost! 

4 Nonequilibrium Phenomena 

4.1 Relaxation Processes 

Relaxation processes can be studied in MD simulations by starting from an in- 
tentionally prepared nonequilibrium state. Then the approach to equilibrium 
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log(dEmax) 




Fig. 5. The maximum energy deviation dEm&x versus the reciprocal of the force time 
step dto for a head-on collision with the “shrat” potential. The four curves, from top to 
bottom, are for the si2.a and si3.b schemes, a fifth order integrator proposed by Hoover 
et al. [18] and the si4.c integrator 



is observed by analyzing the time dependence of (instantaneous) macroscopic 
variables such as the potential contributions to the internal energy and the 
pressure or quantities which are closer to a microscopic description of a fluid 
such as the pair-correlation function. Examples are the decay of an initial crys- 
talline structure, e.g., of cubic type (bcc, fee, sc), when the equilibrium state 
is a fluid. Variables sensitive to the bond-orientation (cubic) anisotropy show 
a much slower decay than scalar quantities like the energy or the pressure [19] 
when the equilibrium state point is close to the fluid-solid phase coexistence. 
From a simulation for N = 1000, LJ particles which started at simple cubic (sc) 
lattice sites with the values T = 1, n = 0.9 (reduced units) for the temperature 
and the density, Fig. 6 shows the (high-frequency) shear modulus G and the cu- 
bic modulus Gc as a function of the (dimensionless) time t. These moduli are 
related to the Voigt elasticity coefficients c by G = (cn — c ±2 4- 3c44)/5 and 
2Gc = Cii — Ci 2 — 2c 44. The latter coefficient vanishes in an isotropic state. 
These quantities are computed according to 

VG = J2 Mr) = ^ r-2 (r-'*<?(r)')' , (25) 

ij 

VGc = ^c(r) = (r-i^(r)')' K^{r), (26) 

where is the relative position vector of particles i,j, = |r*^ |, 

and #(r*-^) is their interaction potential. The prime denotes differentiation with 
respect to r, and V is the volume. The symbol K 4 = (r^ -f — (3/5)r^)r“^ 
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Fig. 6. The logarithm of the shear modulus G {full line) and of the cubic modulus 
Gc {data points) as functions of the dimensionless time t during the relaxation of the 
simple cubic structure • 



is the fourth-order cubic harmonic with the full cubic symmetry. Except for 
the start values, the data presented in Fig.6 have been averaged over 10 time 
steps. Notice that Gc, for times t < 1 , decays (approximately) exponential with 
a relaxation time « 0.2. For t > 1, Gc shows strong fluctuations, G has already 
reached its stationary value. 

Another example is stress relaxation which is usally just measured in solids. 
In simulations, it can also be studied for the fluid phase since it is possible 
to impose a deformation instantaneously. To be more specific, consider, at time 
t = 0, a rescaling of the x and y coordinates of all particles in an equilibrium fluid 
state, as well as the multiplication the pertaining sides of the basic periodicity 
box by the factors A and 1/A, respectively ( A > 0). Then the normal presssure 
difference pxx — Pyy which flucutates about zero at equilibrium will suddenly 
become nonzero with a value from which one may, for small distortions ( A « 1), 
infer the high frequency shear modulus G. Expressions used for the calculation of 
the components of the pressure tensor are given in the following section. For times 
t > 0, the particles will rearrange themselves such that their spatial correlations, 
on average, become isotropic again. Consequently, pxx ~Pyy will decay. For small 
distortions, the time dependence of the normal pressure difference is equal to 
that of the time correlation function which, via a Green-Kubo relation, yields 
the (shear) viscosity. An example for a stress relaxation simulation can be found 
in [20]. 

4.2 Plane Couette Flow 

For a simple shear flow in the x direction with the gradient in the y direction, 
the shear rate 7 is given by 

dvx 
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Such a flow can be either be generated by moving boundaries or forces [1, 13], 
or as used here and indicated in Fig. 7, by moving image particles undergoing 
an ideal Couette flow with the prescribed shear rate (homogeneous shear). Let 
the flow be switched on at t = 0. Then, at time t the image particles above 
(below) the basic (central) box have moved in the x direction to the right (left) 
by the distance ^tL modulo(L) where L is the length of the periodicity box 
in the y direction. Of course, the periodic boundary conditions for the particles 
leaving and entering the basic box have to be modifled (Lees-Edwards boundary 
conditions, [2]-[8]). For a system in a fluid state in equilibrium and for not-too- 
large shear rates, a linear velocity profile typical for a plane Couette flow is set 
up in the basic box (from which the data are extracted). At high shear rates 
where plug-like flow also occurs it is essential to use a velocity “profile unbiased 
thermostat” (PUT, [4, 14]). A shear flow can also be generated by modifying 
the equations of motion (SLLOD, [7, 8]). For a recent review of NEMD results 
for rheological properties simple and complex fluids see [21]. 




Fig. 7. Moving periodic images generate a plane Couette flow. The thin circles mark 
the positions the image particles would have without the shear flow 



4.3 Viscosity 

Rheological properties such as the (non-Newtonian) viscosity and normal pres- 
sure differences are obtained from the Cartesian components of the stress tensor 
or of the pressure tensor which is the sum of “kinetic” und 
“potential” contributions: 

Pm-' > 

^PmI" = , 

i 



(28) 

( 29 ) 
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■ ( 30 ) 

ij 

Here is the peculiar velocity of particle i.e. its velocity relative to the flow 
velocity v(r^), is the relative position vector of particles i,j and 

is the force acting between them. The Greek subscripts /i, z/ which assume the 
values 1 , 2 , 3 stand for cartesian components associated with the x, z directions. 
In the simulations, the expression for the pressure tensor given is averaged over 
many ( 10 ^ to 10 ®) time steps. 

For the present flow geometry, the (non-Newtonian) viscosity rj is obtained 
by dividing the yx{21) component of the stress or pressure tensor by the shear 
rate 7 : 

V — ^yxl'j = ~Pyx/'J • (31) 

The kinetic and potential contributions to the pressure tensor and to the viscos- 
ity can be computed separately from the simulation. Only the sum can be mea- 
sured in a real experiment. The kinetic contribution to the viscosity is the domi- 
nating one in dilute gases [11]. In dense fluids (liquids) the potential contribution 
is the more important one (Figs.8,9). The data shown stem from simulations with 
iV = 512 particles [ 12 ], the interaction has been cut off at r = rc = 2.5ro. 




Fig. 8 . The kinetic {open circles) and the potential {closed circles) contributions to 
the shear stress ayx = —Pyx, the potential contributions to the scalar pressure 
{squares) and to the normal pressure difference pyy — pxx {diamonds) for an LJ fluid 
as functions of the shear rate 7 . The density n = 0.84r^^ corresponds to the triple 
point density, the temperature T = ^o/k-Q is somewhat higher than the triple point 
temperature 
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Fig. 9. The kinetic {kin, open circles) and potential {pot, closed circles) contributions to 
the viscosity r] = —pyx/y^ and the total viscosity inferred from the entropy production 
{ent, diamonds) for an LJ fluid as functions of the shear rate 7 for the same density 
and temperature as in Fig.8 



Normal stress or pressure differences, e.g., axx ~ ^yy — Vyy ~ Pxx can be 
computed analagously, cf. Fig.8. At small shear rates one has —pyx ^ 7 and 
Pyy - Pxx ~ 7^, as well as p^x + Pyy ~ ‘^Pzz ~ 7^- 

In Fig. 9 the total vicosity is also shown as it follows from the entropy pro- 
duction, which is proportional to 7^7^ and is determined by the heat removed 
from the system by the thermostat. 



4.4 Structural Changes 

The shear-rate dependence of the viscosity as displayed in the “flow curve”. 
Figure 9 shows four regimes: (/) the Newtonian flow regime where the shear 
viscosity r] is independent of the shear rate 7 and where normal pressure differ- 
ences practically vanish; in the present case the Newtonian regime corresponds 
to 7 < 0.1 (in LJ units); {II) a weak shear thinning for 0.2 < 7 < 2; (HI) a 
strong shear thinning for 2 < 7 < 20; {IV) a shear thickening for 7 > 20. 

These qualitative diflferences of the flow behavior are linked with diflFerent 
flow-induced structural changes in the fluid. In regimes (/) and {II) these can be 
noticed in the pair-correlation function g{r) or equivalently in its spatial Fourier 
transform, the static structure factor 5(k) which determines the scattering in- 
tensity. Both quantities become anisotropic in the presence of a viscous flow. 
The structure factor shows distorted Debye-Scherrer rings. In regime {III) a 
long-range partial positional ordering takes place which is apparent in real space 
and it is evident in snapshots [25, 26]. Of course, the long-range ordering is also 
seen in g{r) and it leads to Bragg-like peaks in 5(k) [10-24]. 




284 Siegfried Hess 



Above, the various flow regimes have been distinguished by the shear rate 
expressed in LJ units. The physically relevant variable, however, is the product 
7 r of the shear rate and the Maxwell relaxation time r which, in turn, is given 
by the small shear rate limit of the ratio r^/G, i.e., of the viscosity and the (high 
frequency) shear modulus G. The latter quantity, which can also be extracted 
from the simulation, is approximately 25 for the present system and the relax- 
ation time is r 0.1 in LJ units. Thus non-Newtonian flow phenomena can be 
observed for 7 > O.lr”^. In simple fluids such as hquid Argon this corresponds 
to a shear rate which is several orders of magnitude larger than 10 ® which 
can reasonably be reached in laboratory experiments. The situation is differ- 
ent in (dense) colloidal dispersions of spherical particles. There, considerably 
shorter relaxation times occur and non-Newtonian effects can be noticed and 
are of importance for many applications. 

4.5 Colloidal Dispersions 

Dense colloidal dispersions of spherical particles exhibit flow curves which are 
qualitatively similar to those presented above. In the extreme shear thinning 
regime {HI) where partial positional order is observed in the NEMD simulations, 
the static structure factor as measured in small-angle neutron scattering (SANS) 
experiments agrees very well with that computed from NEMD [24]. 

In addition to the direct interaction between the dispersed particles and the 
thermostating influence of the solvent already considered in the MD and NEMD 
simulations, particles in a dispersion feel a friction and the pertaining (Brownian) 
fluctuating forces. These additional forces are taken into account in the Brownian 
dynamics (BD) simulations. Results from such a BD simulation are presented 
in [21]. Furthermore, the particles experience hydrodynamic interactions and 
possess rotational degrees of freedom [22]. 

4.6 Mixtures 

Recently, NEMD results were obtained [21] for a binary mixture of particles 
A and B interacting with an LJ potential cutoff in its minimum (WCA) with 
diameters re = O.SSrA for the A — A and B — B interactions, tab = {'^A + rB)/2 
for the A — B interaction, masses tub = 0.195mA, and the energy parameters 
#A = = ^AB- The size ratio corresponds to that occurring in opals, and 

mixtures of colloidal particles of this type have been studied experimentally 
[28]. The simulations were performed for the high number density n = 1.26r^ 
corresponding to a packing fraction of 0.66 where the mixture is in a glassy 
state and where the viscosity, for small shear rates, decreases proportional to 
7~^. For the various mixtures studied, the total mass density (of the N = 4000 
particles contained in the basic box) was kept constant. In Fig. 10, the viscosity 
is shown as a function of the mass fraction of the smaller particles for the four 
shear rates 7 = 0.005,0.05,0.5,5. All quantities are in LJ units linked with the 
larger particles. At small shear rates, there is a region of concentrations where 
the mixture has a smaller viscosity than both pure components. 
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mass fraction of smaller particles 

Fig. 10. The viscosity of mixtures as a function of the mass fraction of the smaller 
particles for the values of the shear rate 7 indicated in the graph. The density and the 
temperature are n = 1.26 and T = 1 in L J units (H. Voigt) 



5 Complex Fluids 



5.1 Polymer Melts 



To model a polymer melt, one starts from a simple fluid of spherical particles and 
introduces extra binding forces or constraints [29, 36] in order to form molecular 
chains with a prescribed chain length of Nch beads. Rheological studies for LJ 
fluids where the binding was achieved by increasing the energy parameter #0 
for neighbors in a chain by a factor showed many features of the nonlinear flow 
behavior typical for polymeric melts [27, 30]. 

The results to be presented here [32] follow from an extension of the previous 
simulations [31] for a system where all particles interact with the repulsive part 
of the LJ potential (WCA) and an attractive FENE potential with the maximal 
bond length Rq is used for the binding within the chains. More specifically. 



$ = ;= 4^0 

and = 0 for r > 2^/® tq; 






<21/6 



ro , 



# = := -0.5k*$o^ln 



J.2 1 



, r <Ro, 



(32) 



(33) 



and zz: oo for r > Rq. For this potential with Rq = 1.5, k* = 30, 

T = ^o/ks, and nrQ = 0.85 extensive equilibrium MD studies have been 
made by Kremer and Grest [33]. In [31, 32] and for the data to be presented 
here, the same potential parameters and the same state point is used except 
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Fig. 11 . The viscosity of melts consisting of polymer molecules with the chain lengths 
30 (triangles), 100 (diamonds), and 300 (squares) as functions of the shear rate 7 , The 
density and the temperature are n = 0.84 and T = 1 in LJ units (M. Kroger) 



for a slightly smaller density of nrg = 0.84. Molecules with chain lengths 
Nch = 10, 30, 60, 100, 150, 200, 300 and 400 were studied. The systems contained 
N = 6000,8400 and 30000 monomers. In Fig.ll, the viscosity r} is displayed as 
function of the shear rate 7 for A^ch = 30, 100, 300. Notice that the values of rj 
and 7 , both expressed in LJ units, span a much wider range than the data shown 
in Fig.9 for a simple LJ liquid. The Newtonian limit rjo of the viscosity, for long 
molecules only reached at extremely small shear rates, is presented in Fig. 12 as 
a function of the chain length Nch. Two regimes, referred to as a Rouse regime 
where rjo ^ Nch and as a reptation regime where rjo ~ N^h^, can be distinguished. 
The transition between these regimes occures at Nch 100. This value is, as 
expected, about three times the entanglement lenght of 35 inferred from equi- 
librium studies [33] . A procedure to analyze and measure entanglements in MD 
simulations has recently been invented [34]. Other rheological properties, such 
as the first and second normal stress differences and the viscometric functions 
can and have been computed [31, 32]. The shear-induced bond orientation as 
it can be measured in flow birefringence, as well as the static structure factor 
of the whole melt or of marked chains and the shape of single polymer chains 
were analyzed and found to be in good agreement with experiments [35]. Other 
geometries, such as extensional flow of polymer melts (M. Krbger), the defor- 
mation of polymers in a glassy state, and systems with prescribed components 
of the pressure tensor (H. Voigt) have been simulated. Of course, dilute poly- 
mer solutions have also been studied under nonequilibrium conditions [36, 37]. 
Recently, “living polymers”, i.e., chains which break and form again such as the 
worm-like micelles in surfactant solutions, can also be treated in NEMD Simula- 
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Fig. 12. The Newtonian viscosity tjo of polymer melts as a function of the chain length 
Nch (M. Kroger) 



tions when the interaction potential is appropiately modified [38]. Furthermore, 
the investigation of equilibrium properties of chain molecules with stiff and flex- 
ible parts as in main-chain polymeric liquid crystals is under way [39], NEMD 
studies of the intriguingly complex rheological behavior of these substances are 
in preparation. 

5.2 Nematic Liquid Crystals 

Anisotropy of the Viscosity Coefficients. In nematic liquid crystals, the 
viscosity becomes anisotropic when the average direction of the molecules is 
fixed by an external magnetic (or electric) field. Four directions are needed to 
determine the full anisotropy of the shear viscosity. These cases, indicated by 
the labels i = 1, 2, 3, 4 for the pertaining shear viscosities 7]i are the preferential 
direction chosen parallel to the flow velocity {i = 1), to its gradient {i = 2), 
to the vorticity which is perpendicular to both (2 = 3), and to the bisector in 
the flow plane {xy plane) {i = 4). The first three viscosities are referred to as 
Miesowicz coefficients, the difference 7712 = 4774 — 2rji — 2 t}2 is called Helfrich 
viscosity. In the NEMD simulation, the viscosities are obtained according to 

'ni = -pi,xh> ( 34 ) 

where Py^. is the yx component of the pressure tensor as given by (28), with 
(29,30) for the four above-mentioned cases. Of course, the interaction potential 
must now be modified appropriately in order to describe nonspherical particles. 
Special cases of nonspherical interaction potentials are given later. In the oriented 
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system, the pressure tensor has an antisymmetric part which is associated with 
the torque acting on the particles. This antisymmetric part is used in NEMD 
simulations to obtain the Leslie viscosity coefficients 71 and 72 according to 

7i + 72 = - p\x)/l > 71 - 72 = 2(p^j, - pl^)h , (35) 

where again the superscripts 1 , 2 refer to the orientations mentioned above. Due 
to the Onsager-Parodi relation , 



l2 = Vi-m, (36) 

only five of the six “nematic” viscosity coefficients used so far are linearly inde- 
pendent. In addition to the bulk viscosity, there are two coefficients, also linked 
by an Onsager relation, which couple the symmetric traceless and the trace parts 
of the pressure and of the velocity gradient tensors. Hence seven coefficients are 
needed to describe the viscous properties of nematic liquid crystals [40]. 

All coefficients, except for the bulk viscosity, have been calculated and the 
relation (36) has been tested for model fiuids of perfectly oriented ellipsoidal 
particles where the nonspherical interaction potential is obtained from a spherical 
one, e.g., a LJ or SS interaction by an affine transformation [41, 42]. Both prolate 
and oblate ellipsoids of revolution with axis ratios Q > 1 and Q < 1 , respectively, 
were studied. The inequalities 

m <m <m^ 72 < o , (37) 

typical for nematics {Q > 1 ) are also found in the simulations. For nematic 
discot ics (Q < 1 ) one has 



m <V3<Vh 72 > 0 . (38) 

For the affine transformation model, analytic results are available which compare 
favorably with simulations of perfectly oriented molecules. Motivated by the 
success in the comparison of MD data for the anisotropy of the diffusion in model 
fluids with variable degrees of orientation with a modified affine transformation 
model [43], similar considerations have been made for the viscosity coefficients 
[44]. 

A comparison of the nematic viscosities obtained from NEMD calculations 
with those based on Green-Kubo relations has been made for a fluid composed 
of particles interacting with a modified Gay-Berne potential [45]. 



Presmectic Behavior. The perfectly oriented ellipsoidal particles discussed 
above do not possess a smectic phase. In analogy to the ferrofluids and magne- 
torheological fiuids discussed in the next section, a relatively simple model was 
introduced [46] in which the fluid does undergo a transition from the nematic to 
the smectic A phase. Incidentally, it possesses still another smectic phase (smB) 
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Fig. 13. The Miesowicz viscosity coefficients i = 1, 2, 3 as a function of the strength 
^anis of anisotropic part of the interaction for T = 0.25, n = 0.6,7 = 01 SS units 
(C. Pereira Borgmeyer) 



in addition to the solid state. The potential is that of soft spheres and an ex- 
tra anisotropic interaction of P 2 symmetry whose strength is determined by the 
parameter ^anis' 



^ (r-^(r • n)^ - ^ . (39) 

The unit vector n (director) specifies the preferential direction. For ^anis > 
0 this potential models elongated (prolate) particles. At the state point T = 
0.25, n = 0.6, in SS units, and with the interaction cutoff at r = 2.5ro, the 
transition nematic-smectic A occurs at #anis 2.3 (in units of ^o^). The director 

n can be chosen parallel to the directions discussed above in connection with 
the anisotropy of the viscosity. The resulting Miesowicz viscosities r?i,2,s are 
displayed in Fig.13 as functions of the anisotropy parameter ^anis- The shear 
rate is 7 = 0.1 which, at least in the nematic phase, is in the Newtonian flow 
regime. The typical nematic order (37) in the magnitude of the viscosities is 
found for 0 < #anis < 1- For 1 < #anis < 2.3, presmectic effects change this 
behavior. In the smectic phase, for #anis > 2.8, the order of the viscosities 
corresponds to that of a nematic discotic system (38). The shear flow breaks 
the smectic layers but disk-like correlated clusters remain. Of course, the shear- 
induced structural changes can and have been analyzed with standard methods. 
Model fluids of short-chain molecules with a stiff central part and flexible ends 
possess broad smectic A phase [39]. The study of their rheological properties and 
of the shear-induced structural changes is in preparation. 
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Fig. 14. The viscosity coefficients r/i , r /2 as a function of the strength Cmag of the 
dipole-dipole interaction for T = 0.25, n = 0.6 , 7 = 0.06 in SS-units (T. Weider) 



5.3 Ferrofluids and Magneto-Rheological Fluids 



Spherical colloidal particles with a magnetic core, such as occur in ferrofluids in 
the presence of an applied magnetic field, have been modelled by soft spheres 
plus a dipole-dipole interaction [47]: 



^ = ^ 



ss 

0 





(40) 



The parameter Cmag > 0 is proportional to the square of the (induced) magnetic 
moment of the particles which are parallel to n. The angular dependence of the 
nonspherical part of the interaction potential is the same as in (39) , the sign of 
the prefactor and the r dependence, however, are different. Pairs of particles feel 
a disk-like interaction since, for fixed relative kinetic energy, they can approach 
each other more closely in the direction parallel to n than in the perpendicular 
directions. Thus it is not surprising that ferrofluids show an anisotropy analogous 
to nematic discotic liquid crystals [40, 42]. When the dipole-dipole interaction is 
stronger, however, chains are formed which, at higher densities, are arranged in 
partially ordered spatial structures. This affects the viscous behavior in a dra- 
matic way. An example is shown in Fig. 14 in which the viscosities rji (magnetic 
field parallel to the flow velocity) and rj 2 (magnetic field parallel to the gradient 
of the flow velocity) are plotted as functions of the anisotropy parameter Cmag- 
The state point is T = 0.25, n = 0.6, in SS units and the shear rate is 7 = 0.06. 
The interaction is again cut off at r = 2.5ro, N = 1000 particles are used [48]. 
For 0 < Cmag < 3, the discotic behavior > 772 is observed. For e^ag > 3, the 
viscosity 7/2 for the field parallel to the gradient direction increases strongly with 
increasing Cmag- Notice that a logarithmic scale is used for the viscosity. A yield 
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stress occurs for the higher values of the dipole-dipole interaction. This is typical 
for the magnetorheological (MR) fluids which are similar to the ferrofluids but 
are composed of particles with stronger dipole-dipole interactions and usually 
contain a higher volume fraction of colloidal particles. Electrorheological (ER) 
fluids can, with appropriate modifications, also be treated theoretically by the 
model. 
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Abstract. We axe describing fundamental principles for molecular-dynamic simula- 
tions of structure formation in real materials at finite temperature. Various concepts for 
the calculation of total energies and interatomic forces are reviewed: Classical concepts 
based on the construction of empirical potentials and quantum-mechanical concepts 
combining the atom dynamics with a simultaneous solution of the electron problem of 
the many-atom configuration within density-functional theory. Out of these concepts 
we introduce in more detail a parameter-free density-functional-based nonorthogonal 
tight-binding scheme. This method combines the advantages of the simplicity and effi- 
ciency of semiempirical tight-binding approaches with the accuracy and transferability 
of ab initio calculations. After describing the simulation geometries and regimes for 
clusters, bulk structures and surface modifications the accuracy and high transferabil- 
ity of the interatomic potentials to the simulations of all-scale systems including also 
heteronuclear interactions are verified. Various successful applications of the method to 
the study of Ceo-polymerization, the stability of highly tetrahedral amorphous carbon 
and the characterization of diamond surface reconstructions are summarized. 



1 Introduction 

There is currently a growing interest in materials science at an atomic level of 
understanding of physical and chemical properties of real structures. While com- 
mon experimental scattering techniques only provide a one-dimensional struc- 
turally averaged picture of the atomic arrangement, spectroscopic data are more 
sensitive to the local chemical bonding environment and local defect configu- 
rations. However, this information is also averaged over a considerable sample 
volume. Real local probes have become available only recently. Well-known ex- 
amples are atomic-scale imaging of surfaces by scanning tunneling microscopy 
(STM) [1] or single molecule spectroscopy (SMS) [2] in fluids or solids. These 
techniques are now modified for very different applications. However, the very 
high complexity of the measured signal in relation to energy-dependent charge 
density distributions, vibrational excitations, and electronic transitions makes a 
further theoretical treatment necessary. 

Going beyond phenomenological considerations, an atomic-scale interpreta- 
tion of the experimental data has to be based on realistic structure models (stable 
and metastable minimal-energy configurations) of various-scale complex systems 
ranging from clusters and molecules to crystalline and amorphous solids and solid 
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surfaces. Starting from the atomic coordinates of these systems, a detailed the- 
oretical analysis of material properties becomes possible. In combination with 
the experimentally derived mechanical, vibrational, electronic, and optical data, 
these results may lead to a fundamental understanding of the material physics 
and chemistry. 

In a further step, even more interesting questions can be raised about how 
particular structures with desirable properties are going to be formed in op- 
timized technological processes. For instance, there are various applications of 
molecular-dynamics modeling to study thin-film growth on solid substrate sur- 
faces [3]. Applying rather different approaches ranging from very crude binary 
collision modeling [4] through empirical potentials [5] up to highly sophisticated 
density-functional methods [6] , the thin-film formation by energetic particles [7] , 
the nucleation of amorphous bulk materials [8], and elementary mechanisms in 
homoepitaxial and heteroepit axial growth [9] are currently under investigation. 
In such simulations, a rapidly increasing number of new hypothetical structural 
configurations of (possibly metastable) clusters, fullerenes, nanotubes, and solids 
with interesting properties has been predicted. In this context, the computer sim- 
ulations have been developed into a very nice tool for the structural design of 
real materials and complex systems. 



2 Simulation Methods 



In order to compare theoretical results to experimental data, the ultimately 
investigated model structures should represent stable or metastable minimal- 
energy configurations which have to be obtained by finite-temperature structure 
optimization. The two main techniques which can be applied are the Monte 
Carlo (MC) and the molecular-dynamics (MD) simulated annealing (SA) tech- 
niques [10, 11]. Both methods have to rely on a mathematical description of the 
total energy of the system as a function of all atomic coordinates, EtotCRi)- 
Whereas the MC method follows a stochastic path to search for the minimal- 
energy configuration by repeatedly generating random atom configurations that 
are weighted by the total energy itself, the MD scheme determines the atom 
trajectories {Ri(t), Pj(t)} that are driven by the interatomic forces. 

In both methods, simulated annealing means the consideration of finite tem- 
perature during the minimal-energy search. If a MC optimization step results in 
a configuration with lower energy, it will be accepted as a new reference system 
for the following steps. However, in case an energetically less-favorable configura- 
tion is produced, the new structure is only accepted if the Boltzmann probability 
for this configuration, defined as Pb = ^xp{— — P°o^)/kBT}, is larger than 
some random number out of the interval (0,1). In this way, a finite-temperature 
noise is introduced into the search path, which allows the system to overcome 
energy barriers in favor of establishing the global rather than a local minimum. 
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During the MD, the incorporation of finite temperature is achieved more 
naturally by coupling the kinetic energy of the particles (mass velocity vi) 
with the thermal energy of the system, {equipartition theorem)^ 

Focusing further on the MD method, the Newton equations of motion for all 
atoms in the structure have to be solved. 



F| = M/ii/ = 



-dEtotHKj}) 

dRi 



( 2 ) 



To realize this on a computer, one makes use of time-discretization tech- 
niques [12] using finite difference methods. From the difference equations one 
derives recursion relations for the positions and/or velocities (momenta) . These 
algorithms proceed in the time direction yielding the phase-space trajectories of 
all atoms. 

The most straightforward discretization of the differential equation stems 
fi:om the Taylor series expansion of the atom positions Ri at an infinitesimal 
time step t + /i. 



n— 1 



Riit + h) = Tiiit) + 



(3) 



where X„ gives the error involved in the approximation and is the i-th time 
derivative of the particle position. Using this equation in two steps at n = 3 a 
very simple, accurate and efficient algorithm is derived: 



R/ (t h) — R/ (t) h 

Ri{t - h) = Ri{t) - h 



d-Rijt) 

dt 

dKi{t) 

dt 






dt^ 

- dt^ 



+ Xs 

+ x; . 



(4) 

(5) 



Note, that X 3 7 ^ X3. Adding one to the other equation and neglecting higher 
order corrections, we derive the Verlet algorithm [13], 



R; (t -h /i) = 2 R/ (t) 



Ri{t-h) + ^Fiit) 



( 6 ) 



In this form the recursion relations produce only the positions. However, it is im- 
portant to know the velocities to controle the kinetic energy or to study transport 
properties via the velocity autocorrelation function, for example. The velocity 
can be approximated as follows: 



Vi{t) = 



Tli{t + h) - Ri(t - h) 



2h 



(7) 
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Notice, that at time t-\-h the computed velocities are those of the previous time! 
Hence, the kinetic energy is one step behind the computed potential energy. 
In order to achieve control of the energy conservation it would be desirable 
to determine the atomic positions and velocities at the same time. By using 
v/(t) = {Ri{t — h) — R/(t)}//i, we finally arrive at 



Rj(t) = Rj(t - h) + h\i{t - h) 


(8) 




(9) 



thus representing together with (6, 7) two simple and efficient algorithms. 

However, before going into any MD simulation, we have to turn back to the 
basic principles of the calculation of total energies and interatomic forces. 

3 Total Energies and Interatomic Forces 

Currently, rather different theoretical approaches are applied in performing total 
energy calculations on clusters and extended systems. 



3.1 Classical Concepts 

These are fast, computationally efficient and based on the construction of pa- 
rameterized empirical potentials. The potential parameters are typically fitted 
to data of equilibrium crystalline structures provided by experiments or ab initio 
methods which will be discussed later. The total energy of the system may be 
described by many-body potentials expanding the interatomic force terms in 
increasingly collective interactions (binary, three-body, . . . ), 

N N N N N 

Stot ({Ri}) = E E ^2 (Ri, Ri) + ^ x; Vs (Ri, R,- , Rfc) + . . . (10) 

i j>i i j>i k>j 

As the simplest pair potential neglecting higher-order terms, the Lennar d- Jones- 
type potential (LJP) [14], 



V2(Ri,Ri) =4e 




( 11 ) 



has been used to accurately model noble gases to yield reliable bond energies and 
bond lengths, rij is the interatomic distance and e, a are parameters determined 
by fitting to known properties of the gas phase such as viscosity, virial coeffi- 
cients, etc. The potential is suitable for atoms which are only slightly distorted 
from their stable closed-shell configurations. The attractive forces between the 
atoms are due to fluctuating dipole interactions, which vary as the inverse sixth 
power of the interatomic distance. However, when the atoms get too close to 
each other, the repulsion of the ionic cores and, in part, the repulsion of the 
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filled electron shells (Pauli’s exclusion principle) become dominant. The balance 
between both parts finally defines the equilibrium interatomic distance ^ a and 
the binding energy — e. Also, the curvature of the potential well determines the 
force constant for vibrational motions. 

Although the parameterization has been developed by fitting to gas-phase 
data, the LJP is found to accurately describe the bond energies and equilibrium 
bond length of the solid state of noble gases as well. Due to it’s mathematical 
simplicity, the LJP has also been widely used for modeling covalent and espe- 
cially metallic systems in the past. However, it can be quite incorrect for those 
materials, because it does not include the nature of covalent bonding. 

The determination of energy and characterization of chemical bonding in 
covalent systems is much more difficult because the bond energies and bonding 
neighbor arrangements are strongly determined by the local coordination around 
each atom. For example, a bond between two carbon atoms can be a single, 
double, or triple bond, depending on how many atoms are in the nearest neighbor 
sphere at particular bond angles. Typical many-body semiconductor potentials 
such as the Stillinger- Weber (SW) [15] or Biswas-Hamann (BH) [16] potentials 
are terminated with the three-body terms. Considering for example the SW 
potential, it is preferentially fixed for a certain hybridization type of the atom- 
bonding configurations and does not allow for any hybridization change forced by 
the neighboring atom arrangement. To deal more successfully with this problem, 
in 1987 Tersoff suggested a potential which may be written in terms of effective 
two-particle contributions [17], 

i 

^ij — fci'^ij ) [^ij fR i'^ij ) + ^ij f A j ) ] 

The effective-pair potential Vij consists of a Morse-type potential, 

fR + fA=Ae-^^^-Be-^^^, (14) 

modeling a similar potential well as in the LJP parameterization by the su- 
perposition of an attractive and a repulsive contribution, which are modified by 
introducing the coefficients aij and bij. fd'f'ij) is a simple cutoff function limiting 
the range of the interatomic interaction. Whereas in most cases considered up to 
now (Si and C) the coefficient of the repulsive term aij is chosen to be equal to 
1, the coefficient bij of the attractive term represents an angular-sensitive bond- 
ing strength between atoms i and j depending on how many additional binding 
neighbors these atoms have. By this very intelligent, generally coordination- 
dependent control of the attractive forces, it becomes possible to model many- 
atom potential contributions which are able to describe coordination-dependent 
hybridization changes between sp, sp^ and sp^ as needed in the carbon bonding 
cases. For more details we refer the reader to [17]. 

In application to carbon, a total number of 12 parameters have been fitted 
to properly describe the energetics and structural properties of the zero-pressure 



( 12 ) 

(13) 
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graphite and diamond phase, as well as the higher-coordinated high-pressure sc 
and bcc lattices. The potential has been shown to be very efficient for use in 
MD simulations and to work well for describing the properties of crystalline sys- 
tems. But as all empirical potentials, also the Tersoff potential may only hardly 
be transferred to bonding situations which have not been included into the pa- 
rameterization. Consequently, small clusters, defects, amorphous structures and 
atom configurations on surfaces are described incorrectly in many cases. For ex- 
ample, the dihedral angular interactions orienting the bonding planes of paired 
undercoordinated sp^ atoms in an amorphous matrix favoring their 7r-bonding 
are completely missing. Additional effort made by D. Brenner to generalize the 
Tersoff potential [18] has been successful by partly removing these problems and 
including interactions with hydrogen. However, by including even more param- 
eters, such a scheme is hardly applicable for describing additional heteronuclear 
interactions, needed for doping and alloying studies. 



3.2 Density-Functional Theory, Car— Parr inello MD 



The problem of transferability, is solved in general by using ab initio MD con- 
cepts on the basis of density-functional theory (DFT). Hohenberg and Kohn 
proved that the ground-state total energy of an electron gas, including exchange 
and correlation may be expressed as a unique functional of the electron den- 
sity [19], even in the presence of an external static potential. Kohn and Sham 
further showed how the many-electron problem may formally be replaced by 
an exactly equivalent set of self-consistent one-electron equations [20] . Following 
these guidelines, the total energy of a system with an electronic density n(r) can 
be written as: 

S[n(r)] = To[n(r)] + -J dFdV' 

+ j n(r)ynuci(r)dy + J n(r)£xc[n(r)] dF . (15) 

In the above equation. To is the kinetic energy functional for a non-interacting 
electron gas, the second and third terms represent the electron-electron and 
electron-nuclear Coulomb energies, respectively. The last term is referred to as 
exchange-correlation energy. For the ground-state density, FJ[n(r)] has a mini- 
mum, thus including particle conservation: 



S [E[n{r)] -h p{N — f n(r) dV] 
Sn{r) 



(16) 



The additional Lagrange parameter p guarantees the correct normalization 
of n(r) {N is the number of electrons). Using (16) and the assumption that 
the electron density can be expressed as a sum of single-particle wavefunction 
densities, 

n(r) = f^|!?i(r)|2 

2=1 



( 17 ) 
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leads to the Kohn-Sham equation: 



2 




dV + Kuci(r) + yxc[n(r)] 

|r-r'| 



= eiW’iir) , 



( 18 ) 



Kc[n(r)] 



<5(n(r)gxc[n(r)]) 

5n(r) 



(19) 



This equation is similar to the Schrodinger equation; however, the Coulomb 
potential due to the nuclear charges Vnuci is replaced by an effective potential 
Kff, 

+ ^xc[n(r)] , (20) 

which depends also on the electron density and single-particle wavefunctions. 
Consequently, one has to deal with a self-consistent (scf) problem which is usually 
solved by an iterative procedure. In practical applications, the are expanded 
in terms of different basis functions. Consequently, the solution of the Kohn- 
Sham equations is transformed into the solution of a matrix eigenvalue problem. 

The strength of density-functional theory lies in the fact that reasonable ap- 
proximations have been found to express the exchange-correlation energy den- 
sity £xc[?^(r)]. Widely used is the local-density approximation (LDA), wherby 
^xc[^(r)] depends only on the local electron density: 



£xc[«(r)] = £xc(i^(r)) 



(21) 



The best available LDA parameterizations for 6xc are based on Quantum- 
Monte-Carlo calculations performed for the uniform electron gas. They have been 
proven to adequately describe equilibrium properties of clusters, molecules, and 
solids, such as electronic and geometric structures, vibrational frequencies, and 
relative energies of different phases. However, there are also serious problems. For 
instance, the electronic gap in semiconductors is significantly underestimated. 

If one has to perform density-functional-based dynamic simulations in atomic 
systems, one can take advantage of the fact that the electron mass is much 
smaller than the mass of any nucleus and hence the motion of the nuclei can be 
treated in a classical way whereas the electrons will follow the nuclei adiabati- 
cally. This approximation is called the Born-Oppenheimer approximation. Con- 
sequently, one can calculate the ground-state electron density for a given atomic 
configuration and then determine the forces as derivatives of the total energy 
with respect to the nuclear coordinates. These forces can be used in molecular- 
dynamics (MD) simulations. However, the determination of the ground-state 
density is very time consuming due to the necessary scf procedure. 

For that reason. Car and Parinello [21] developed a different scheme to use 
in MD simulations. They introduced a fictitious Lagrangian for the two sets 
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of independent degrees of freedom R/ (nuclear coordinates) and (electronic 
wavefunctions): 

L = ^ /zi / J E R/] 

i I 

+ , ( 22 ) 

leading to the equations of motion: 

H- Ajk^kiT^i t) 5 (23) 

MiKi = -VE . (24) 

Ml are the nuclear masses and fii are fictitious masses related to the elec- 
tronic degrees of freedom. The Lagrange parameters Aij have been introduced to 
guarantee an orthonormal behavior of the The Car-Parinello scheme has the 
advantage that electronic and nuclear degrees of freedom can be treated simulta- 
neously, thus avoiding the explicit determination of the electronic ground-state 
for each geometry. 

Although there has been much success in applying these methods to ever 
larger systems, they are still too slow for the investigation of many interest- 
ing problems. On the other hand, the accurate ab initio calculations based on 
scf density-functional (DF) [21, 22] or Hartree-Fock (HF) [23] theory represent 
without any doubt very reliable benchmarks for all other methods. 

Due to the limitations in the transferability of empirical potentials to all scale 
systems including also heteronuclear interactions and the use of time-consuming 
ab initio methods, semiempirical techniques have been developed to simulate 
extended systems with reasonable computational costs. In addition to numerous 
traditional quantum chemical methods, tight-binding (TB) schemes have been 
very successful [26, 27, 28, 29]. These methods can be seen as simplified ab initio 
methods which still use the formalism of quantum mechanics to determine the 
electronic properties of the system. However, instead of actually calculating the 
Hamiltonian matrix, the elements of this matrix are fitted to reproduce an arbi- 
trary set of input data. In general, only two-center contributions to the Hamilto- 
nian matrix are considered: that means that terms which include wavefunctions 
and potentials on three different atoms are neglected. In many cases, the results 
of these schemes deviate only slightly from those of more sophisticated methods. 
However, the usual way of fitting the electronic Hamiltonian (matrix elements) 
to some input data is rather complicated and not very straight forward. That is 
why there is considerable interest in almost parameter-free tight-binding meth- 
ods such as the one developed by our group in collaboration with Gotthard 
Seifert from the Technical University Dresden, which has almost the accuracy of 
scf LDA calculations but does not suffer from the drawback of exploding com- 
putational costs with growing system size. Our method is a nonorthogonal TB 
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scheme which means that the basis functions used in the calculations are allowed 
to overlap. 

Our method tries to avoid the difficulties arising from an empirical parameter- 
ization by calculating the elements of Hamiltonian and overlap matrices ab initio 
out of a local orbital basis with the help of DFT LDA and some integral ap- 
proximations. For this reason, it can be seen as an approximate LCAO DFT 
scheme yielding exactly the same energy expression as common nonorthogonal 
TB schemes. The only, but important, difference is that there is a well-defined 
procedure for how to determine the desired matrix elements. We will refer to 
our method as a nonorthogonal density-functional-based tight-binding (DF TB) 
scheme. As in usual TB formulations, only two-center Hamiltonian matrix ele- 
ments are considered. Despite the extreme simplicity of this approach compared 
to scf [21, 22] and ab initio calculations using the Harris functional [24, 25], the 
method has proven to be transferable to complex carbon and hydrocarbon sys- 
tems and recently to Si(H), Ge(H), SiC, BN, CN systems, GaAs and Si02. In 
this way we support discussions in the literature [25, 28, 29] that nonorthogo- 
nality is a key to transferability, but in our opinion this has to be combined with 
a first-principle based derivation of all interaction and overlap matrix elements. 



4 Density-Functional Based Tight-Binding Method 

Our method [30], based on the work of Seifert, Eschrig and Bieger [31, 32], ap- 
plies the formalism of optimized linear combination of atomic orbitals (O-LCAO) 
as introduced by Eschrig and Bergert for band structure calculations [33]. In this 
approximation, the Kohn-Sham orbitals 'ipi of the system are expanded in terms 
of atom-centered localized basis functions 

V’i(r) = ^ C^i(t>v{r - Rjk) , (25) 



solving the Kohn-Sham equations in an effective one-particle potential leff(i*)* 

H^Piir) = SiMr) , H = f-^ Feff(r) . (26) 

As a result, the Kohn-Sham equations are transformed into a set of algebraic 
equations: 

^ — EiSfji/) = 0, V/i , i , (27) 

where 

. (28) 

It has already been shown by a number of authors [34, 35, 36, 37] that the 
total energy of the system can be approximated as a sum over the band structure 
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energy (sum of the eigenvalues of all occupied Kohn-Sham orbitals) and a short- 
range repulsive two-body potential: 

£^tot({R-fe}) = -E^Bs({Rfc}) + -^rep ({|Ri-Rj|)) 



=5:„,.,({r*»+e V,ep(|Rj-Rife|) , (29) 

i k<l 



where ni is the occupation number of orbital i. 

Interatomic forces for MD applications can easily be derived from an exact 
calculation of the gradients of the total energy at the considered atom sites, 



F/ = - 



agtot({Rf}) 

dHi 



i pL V 



dB.1 



+ Bi 



dlLi 



dErep 
dKi ■ 



(30) 



In order to get the necessary matrix elements and the repulsive contribu- 
tions Frep, we perform the construction of our potential in three steps which are 
discussed in detail below: 



1. Creation of (spin-unpolarized) pseudoatoms by solving a modified atomic 
Kohn-Sham equation, 

2. Calculation of all Hamiltonian and overlap matrix elements, 

3. Fitting of the short-range repulsive potential V^ep- 



4.1 Creation of the Pseudoatoms 

We write the pseudoatomic wavefunctions in terms of Slater-type orbitals and 
spherical harmonics: 

Mr) = E (^) . (31) 

As many tests have shown [38], five different values of ai and n = 0, 1, 2, 3 form a 
sufficiently accurate basis set for all elements up to the third row of the periodic 
table of elements, functions does not yield any significant changes, this basis can 
be considered converged. 

Using (31), we perform a self-consistent solution of modified atomic Kohn- 
Sham equations: 

[f + yP-t(r)]<^,(r) = erV.(r) , (32) 

1/P®^‘(r) = Fnucleus(p) + YHartree[n(p)] + + f ^ j ■ (33) 

Vy^c is expressed in terms of the local density approximation as parameterized 
by Per dew and Zunger [39]. The additional term {r/vo)^ appearing in V{r) 
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in (33) was first introduced by Eschrig et aL [33, 38] in order to improve band- 
structure calculations performed within LCAO. It forces the wavefunctions to 
avoid areas far away from the nucleus, thus resulting in an electron density that 
is compressed in comparison to the free atom. The parameter N has only a 
rather small influence on the results; we choose iV = 2 for all types of atoms. 
The radius Vo may be optimized to yield best results; however, we have found 
that To « 2rcov is usually a good choice, where Tcov is the covalent radius of the 
element. 



4.2 Calculation of Matrix Elements 

We use the solutions (j)i, of (32) as basis functions for the LCAO treatment of the 
system. Within a minimal basis description only valence orbitals are considered. 
As an approximation, we write the one-electron potential of the many-atom 
structure as a sum of spherical atomic contributions: 

Feff(r) = ^yo''(|r-Rk|), (34) 

k 

where Vq is the Kohn-Sham potential of a neutral pseudoatom due to its com- 
pressed electron density, but not containing the additional term (r/vo)^ any 
more. This equation differs from the one used in older studies [31, 34, 35] where 
the potentials of free neutral atoms were used to evaluate the matrix elements. 
Using the potentials of compressed pseudoatoms for the evaluation of the matrix 
elements has two advantages: 

— Numerous self-consistent calculations on molecules and solids have shown 
that the electron densities in these structures can be roughly approximated 
as a superposition of compressed atomic densities. Thus, by using this infor- 
mation, we anticipate the results of a more sophisticated calculation up to 
a certain extent. In addition to that, as has already been shown by Seifert 
et al. [32], the densities due to superposed pseudoatomic potentials are even 
more realistic than a simple superposition of pseudoatomic densities. 

— The necessary integral approximations work better if one uses basis func- 
tions that decay more rapidly than those of the free atom. Furthermore, 
Eschrig [33] has shown that the modified wavefunctions form a better ba- 
sis set in condensed-matter applications. Similar ideas on confined orbitals 
have been discussed more recently by Jansen and Sankey [40] and Chetty 
et al. [41]. 

The overlap matrix consists only of two-center elements and can be calcu- 
lated in a straightforward way. Consistent with (34), one can neglect several 
contributions to the Hamiltonian matrix elements [31] yielding: 

/^freeatom if p = u 

= I ) if A^B . (35) 

I 0 otherwise 
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The indices A and B indicate the atom on which the wavefunctions and the po- 
tentials are centered. As can be seen easily, only two-center Hamiltonian matrix 
elements are dealt with. Approximation (35) may be seen as an LCAO variant 
of a cellular Wigner-Seitz method as applied for instance by Inglesfield [42]. As 
follows from (35), the eigenvalues of the free atom serve as diagonal elements of 
the Hamiltonian, thus guaranteeing the correct limit for isolated atoms. 

Due to the fact that all matrix elements depend only on interatomic dis- 
tances, we need to calculate them only once for each pair of atom types. For the 
two-center integral evaluation, the analytic formula of Eschrig [43] is applied. 
Matrix elements corresponding to a given interatomic distance can easily be ob- 
tained by interpolating between the stored values. Therefore, the creation of the 
Hamiltonian requires about the same time as common TB models. The calcula- 
tion time is mainly determined by the eflBciency of the diagonalization routines. 
We are still using Householder and QL algorithms but the implementation of 
recently developed linear-scaling methods [44] is in progress. 



4.3 Fitting of Short-Range Repulsive Part 

The construction of the Hamiltonian as described in the previous subsection al- 
lows us to calculate the band structure energy Ebs non-self-consistently. Thus, 
the short-range repulsive part Vrep(B) can easily be determined as the differ- 
ence of the total energy resulting from a self-consistent calculation and Ebs for 
different values of interatomic distances R: 



Kep(R) = - EssiE) . 

We write Vrep(R) as a sum of polynomials: 



(36) 



Vrep(R) = 



Np 

dn{Rc — R)^ for R < Rc 

n=2 

0 otherwise 



(37) 



Equation (37) guarantees T4ep(7i) to be zero for R> Rc and a smooth behavior 
at the cutoff radius Rq. In many cases, this expression is sufficient enough to fit 
the points given by (36) at iVp = 5. 

In most cases, diatomic molecules can be used to fit T^ep(7i)- However, these 
small systems sometimes tend to show level crossings causing sudden changes 
of orbital occupation numbers (as long as occupation numbers are restricted 
to integers) and thus discontinuities in the first derivatives of the energies. This 
behavior makes a reasonable fit in the vicinity of the level crossing almost impos- 
sible. Fortunately, one is not restricted to diatomic molecules. Other information 
on systems with uniform bond length available from self-consistent calculations 
can be included in the fit too, e.g., equilibrium crystalline phases. 
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5 Vibrational Properties 



For a reliable vibrational analysis one has to use fully relaxed model structures 
(amorphous bulk supercells, surface slabs or finite clusters /molecules), in which 
the forces do not exceed a minimal critical value. 

The vibrational properties of an atomic arrangement may be derived within 
the harmonic approximation by a calculation of eigenvectors and eigenvalues 
of the dynamical matrix. This matrix is the matrix of the total energy second 
derivatives with respect to the nuclear coordinates and may be constructed in 
the following way. If one displaces each atom ^ {i = by Ar from 

its equilibrium position into the directions of the three basis vectors (a = 
1,2,3) of the Cartesian coordinate system and into the corresponding opposite 
directions, one can calculate the elements of the dynamical matrix using 
the forces Pj,||e^ acting on each atom j in the directions (/3 = 1, 2, 3): 



ip— r'-i- 

JMN/3 






2Ar 



(38) 



The signs refer to the two possible displacements of atom i in the direction 
±ea- As can be seen from a Taylor expansion of the total energy, (38) elimi- 
nates errors of 0{Ar) in the elements of H, To find a value for Sr in practical 
applications, one has to consider two sources of errors, higher order terms in 
the total energy expansion favoring a very small Ar and the numeric instability 
of expression (38) for very small displacements. We found Ar = 0.02 as (Bohr 
radii) to be a reasonable choice. 

After symmetrizing the dynamical matrix and projecting out translational 
and rotational modes, we solve the general eigenvalue problem 



Hy = up My, 



(39) 



where M denotes a matrix with the atomic masses on the main diagonal whereas 
uj and y are the eigenvalues and their corresponding eigenvectors. 

Prom this we obtain the total vibrational density of states (VDOS) 



g{^) = 



3iV 



i=l 



(40) 



For any practical evaluation of such a type of expression we have to replace the 
Dirac’s function by an approximation F{X,uj — Ui) (Gaussian, Lorentzian, . . .) 
where the parameter A controls the “width” of the eigenfrequencies within the 
spectra. Going beyond this, we can convolute the spectra for the purpose of 
comparison with experimental results by the resolution function of the specified 
detector. From the mathematical point of view we are also able to define the 
following projection technique for the calculation of the partial VDOS gn{^) for 
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any given arbitrary index set of coordinates % = Restricting one- 

self to normalized eigenvectors, which guarantees the strict additivity of various 
partial VDOS to the resulting total VDOS, we get 

3iV 

(41) 

sen i=i 

where we have to sum over all elements of the index set and (ps|yf) is the scalar 
product of a vector = (0, . . . , 0, 1, 0, . . . , 0) with the ith eigenvector y^. This 

s-l 3N-S 

turns out as a helpful tool to characterize changing vibrational contributions of 
chemically different atom groups within the systems. In an amorphous carbon 
system, for example, we project out the partial VDOS of atoms of equal hy- 
bridization type. We can do the same to select out defect atoms in a crystal or 
cluster. Another important example is the description of surface related phonons 
of two-dimensional periodic slab models. Here we have to use the method to 
separate the surface excitations from bulk-like modes. To avoid errors connected 
with the finite depth of such slab models one has to saturate the dangling bonds 
of the bottom crystalline layer with hydrogen and keep these atoms fixed at 
their equilibrium positions by giving them infinite masses. This simulates the 
continuous transition from bulk-like to surface-like behavior. 

Furthermore, the determination of the degree of localization of the vibrations 
gives a lot of useful information about the topological defect structure of the 
systems. The definition of a localization measure, such as an inverse participation 
ratio of the ^’th mode, 

3N 

= ( 42 ) 

i=l 

enables the discussion of a lot of interesting physics around structural defects in 
amorphous and crystalline systems. 

6 Simulation Geometries and Regimes 

In addressing applications to various scale systems from clusters to solids and 
solid surfaces we have to describe the geometries and regimes used for the sim- 
ulation. 



6.1 Clusters, Molecules 

In the case of small clusters and molecules we are dealing with finite systems in 
which, depending on the interaction radius, each atom interacts with all other 
atoms. Consequently, the potential energy surface £^tot({Ri}) in the (3iV-6) di- 
mensional configuration space {Ri} is highly complex, developing a large number 
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of extrema and saddle points even for a few atoms. This fact is already mani- 
fested in systems with only simple two-particle interactions and becomes even 
more obvious if the atoms interact in a more complicated way. 

To theoretically determine the minimal-energy configurations of most stable 
structures, complex search strategies based on chemical intuition or simulated 
annealing (SA) techniques have to be used. In the former case, the starting struc- 
tures are constructed from the experience of investigating similar systems that 
are believed to be close to the configurations that we are looking for. The atom 
configurations may be relaxed into the next lower-lying local energy minimum 
by well-established gradient search methods. Much more efficient and less re- 
stricted are the annealing MD simulations because a larger area of configuration 
space is scanned. Thus, it is more likely that global minima will be found with 
these methods [45]. In the SA regime, the clusters may be heated to high enough 
temperatures leading to high atom mobilities and rearrangements. About every 
500 time steps, an output cluster is generated which is gradually cooled down 
by simply rescaling the velocities followed by a final relaxation using conjugate 
gradient techniques. The resulting set of clusters or molecules consisting of an 
equal number of atoms is then analyzed by comparing the total energies of the 
different structures to determine the energetic order of the isomers and to select 
the most stable ones for a more detailed investigation of their physico-chemical 
properties. 



6.2 Bulk- Crystalline and Amorphous Solids 

To simulate bulk-solid structures and their properties, one has to remove all 
surfaces from the calculations by constructing three-dimensional periodic super- 
cells (for example, simple cubic) incorporating an appropriate large number of 
atoms (A'>100) for the problem to be studied. The artificially introduced trans- 
lational symmetry is described by supercell translation vectors, generating the 
atom positions in the surrounding 26 cells from the atom coordinates in the cen- 
ter cell. Using such a construction, the investigation of structural properties or 
effects of interatomic order as in amorphous systems are restricted to the range 
of the supercell size. To properly describe such systems, the interaction of all 
atoms in one cell with all other atoms falling into the interaction range has to 
be considered. That also implies that interactions between atoms in neighboring 
cells constructed via a supercell translation will be included if the interaction 
radius propagates into a neighboring cell. In this simple version of a model with 
periodic boundary conditions, the interaction radius by itself has to be chosen 
less than half the side-length of the cubic supercell, otherwise we would have a 
multiple counting of interactions. 

For the generation of a metastable bulk amorphous model structure, we use 
a simulation regime in which the system is propagated along a path of constant 
energy in phase space by the algorithms (6-9). The initial positions fix the contri- 
bution of the potential energy to the total energy, and the velocities determine 
the kinetic contribution. The common technique for searching for amorphous 
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equilibrium structures by computer simulations is then to quench or cool a liq- 
uid phase to room temperature over a certain time interval in the picosecond to 
nanosecond range. If we choose, as the most simple case, the volume of the super- 
cell to be constant during the search, and adjust the system at each temperature 
to a given constant energy, we maintain a microcanonical ensemble during the 
MD run, N,V,E = constant. 

In the first step of the simulation, we equilibrate the liquid phase by adjust- 
ing the system at a corresponding high energy. This is accomplished by scaling 
the velocities until the temperature of the liquid phase has been established 
by equilibration. This can be controlled by checking the validity of the virial 
theorem (equipartition of averaged potential and kinetic energy) and follow- 
ing the temperature which should fiuctuate around the mean value according 
the thermal energy defined by the equipartition theorem. Proceeding in steps of 
500 K, the structure is then cooled down until the final energy corresponding to 
an amorphous phase at room temperature is determined. At each temperature 
step, the system has to be partly equilibrated. Algorithmically, the equilibration 
procedure is: (i) integrate the equation of motion, (ii) compute the kinetic and 
potential energies, (iii) if the kinetic energy is not equal to the desired, then scale 
the velocities, (iv) repeat until the system has reached equilibrium. 



6.3 Surfaces and Adsorbates 



Considering surface configurations, the annealing MD simulations for determin- 
ing (meta) stable surface structures make use of conventional two-dimensional 
boundary conditions in a three-dimensional formulation. The periodicity in the 
direction normal to the surface is strongly oversized, making the system prac- 
tically finite in this direction. The surface slabs themselves are composed of 
several (10 to 20) atomic monolayers, each one consisting of a sufficiently large 
number of atoms. The two bottom layers of the slab should be held fixed to simu- 
late the infinite crystalline substrate. Additionally, all occuring dangling orbitals 
have to be removed by saturation with hydrogen. The stability and dynamical 
restructuring are then studied on the remaining top layers, including possible 
adsorbates and finite temperatures. 

In performing MD simulations of periodic supercells, one generally has to 
calculate the energies and forces by a sampling of values at different k-points 
in the Brillouin zone. However, by making the supercells much larger than the 
primitive ones, e.g., including a large number of atoms in the supercell, the eval- 
uation of physical quantities at only the F -point (k = 0) is practically equivalent 
to a summation over a collection of different k-points in the primitive cell. For 
that reason, the calculation of interatomic forces at only the P-point is a valid 
approach in MD applications if the super cell is chosen large enough. 
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7 Accuracy and Transferability 

To prove the accuracy and transferability of the described DF-TB MD method 
we now present results on small clusters, molecules and crystalline solid modi- 
fications, and compare with experimental data as well as with results of more 
sophisticated calculations on the basis of scf LDA and HP (MP) theory treating 
correlation effects in Mpller-Plesset perturbation theory. 



7.1 Small Silicon Clusters, Si^ 

As one prominent example for small clusters which has been broadly investi- 
gated during the past decade by experiments [46] and theory [47, 48], we discuss 
the equilibrium configurations of silicon clusters Si 2 -io [49]. While for small car- 
bon clusters the minimal-energy configurations are either linear chains or cyclic 
structures, the situation in the silicon case is more complex. In contrast to car- 
bon, silicon does not form strong 7r-bonds, but can be found more than fourfold 
coordinated. Therefore, many metastable isomers have to be considered even for 
small clusters. 

Because of this variety of possible equilibrium structures, the determination 
of the lowest-energy geometry and the corresponding ground-state properties is a 
good test for any method. Whereas empirical potentials fitted to bulk crystalline 
structures usually fail in correctly describing small finite atom configurations, we 
obtain the same minimal-energy clusters as Raghavachari et al. within ah initio 
calculations (HF/MP4) [47], see Fig. 1. There are two exceptions: for Sig we find 
a distorted octahedron with two neighboring faces capped (C 2 symmetry) at al- 
most the same energy as the distorted octahedron with two opposite faces capped 
(C 2/1 symmetry) reported by ab initio calculations. To clarify the question as to 
which of these two clusters is more stable, we have performed additionally a self- 
consistent calculation with a generalized gradient functional [50] and found our 
new cluster to be only 0.05 eV/atom higher in energy. In the case of Sig, we con- 
firm a distorted tricapped trigonal prism (C 21 ; symmetry), which was proposed 
recently by Ordejon [51], to be the most stable configuration. 

Figure 2 shows the (zero-point corrected) cohesive energies of the minimal- 
energy clusters compared to ab initio and experimental results. To account for 
the spin polarization of the isolated atoms, we have corrected our energies by 
0.64 eV/atom, which can be determined within a self-consistent local-spin den- 
sity calculation. The deviations of our results from the experimental values 
are very small and comparable to that of the ab initio calculations at 6-31G* 
level [47]. 

7.2 Molecules, Hydrocarbons 

Carbon shows many different types of bonding. All of them can be found in 
the huge class of hydrocarbon molecules known to play an important role in 
the structure formation of many systems. For that reason, it is important to 
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Fig. 2. Cohesive energy of small silicon clusters. Our results (DF-TB) compared to 
experimental and ab initio results at different level of theory 



know how a method performs on these systems. In addition, hydrocarbons are 
well understood and one can refer to an abundant number of experimental and 
theoretical data. For all the properties tested here, accurate self-consistent cal- 
culations are available [52, 53, 54]. Table 1 shows the ground-state geometries of 
the radicals CH, CH2, and CH3 and the molecules H2, CH4, C2H2, C2H4, C2H6, 
CeHe, Cyclopropene C3H4, Cyclopropane C3H6, and n-Butane C4H10. We want 
to note here that according to experiments all the radicals are spin-polarized, 
therefore a direct comparison is difficult for these structures. This is especially 
true for CH2, where the absence of spin in our model leads to a more stable 
singlet state. For that reason, we compare our geometries for this radical with 
spin-unpolarized calculations and the experimentally observable singlet state. 

As in scf LDA calculations, the C-C and H-H single bond lengths are about 
0.02 A too short in comparison to experimental data. The double bond in C2H4 
is about 0.01 A too short, whereas the triple bond in C 2 H 2 has almost the 
same length as known from experiments. Structures with very small C-C-C 
bond angles, such as cyclopropane and cyclopropene are also well described. 
C-H bonds are systematically overestimated by about 0.03 A, a little bit more 
than the overestimation in scf LDA calculations. Bond angles agree within 2° 
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Table 1. Geometric properties (bond lengths XY in A and bond angles XYZ in degree) 
obtained for selected radicals and molecules. The SCF and experimental values have 
been taken from [54] (H 2 through ethane) and [53] (cyclopropene through benzene). 
GGA values refer to calculations using generalized gradient approximations for the 
exchange-correlation functional as described in [54] and [53] 



Molecule 


Variable 


DF-TB 


LSD 


GGA 


Exp. 


H 2 


HH 


0.765 


0.765 


0.748 


0.741 


CH 


CH 


1.138 


1.152 


1.108 


1.120 


CH 2 (singlet) 


CH 


1.134 


1.135 


1.117 


1.111 




HCH 


98.6 


99.1 


99.1 


102.4 


CHs 


CH 


1.114 


1.093 


1.090 


1.079 




HCH 


116.8 


120.0 


120.0 


120.0 


CH 4 


CH 


1.116 


1.101 


1.100 


1.086 


C 2 H 2 (acetylene) 


CC 


1.206 


1.212 


1.215 


1.203 




CH 


1.099 


1.078 


1.073 


1.061 


C 2 H 4 (ethene) 


CC 


1.503 


1.513 


1.541 


1.526 




CH 


1.119 


1.105 


1.104 


1.088 




CCH 


116.3 


116.4 


116.2 


117.8 


C 2 H 6 (ethane) 


CC 


1.503 


1.513 


1.541 


1.526 




CH 


1.119 


1.105 


1.104 


1.088 




HCH 


108.0 


107.2 


107.5 


107.4 


C 3 H 4 (cyclopropene) 


C 1 C 2 


1.318 


1.305 




1.296 




C 2 C 3 


1.509 


1.510 




1.509 




CiH 


1.109 


1.091 




1.072 




HC 1 C 2 


148.4 


149.5 




149.9 


C 3 H 6 (cyclopropane) 


CC 


1.503 


1.504 




1.510 




CH 


1.114 


1.095 




1.089 


C 4 H 10 (n-butane) 


C 1 C 2 


1.511 


1.517 




1.533 




C 2 C 3 


1.520 


1.532 




1.533 


CeHe (benzene) 


CC 


1.389 


1.396 




1.399 




CH 


1.114 


1.095 




1.089 



or even better, exceptions are the radicals where the H-C-H angles are clearly 
underestimated by about 4°. 

Table 2 shows atomization energies (including zero-point corrections) and re- 
action energies for some typical hydrocarbon reactions. The atomization energies 
are with respect to free, spin-polarized atoms: that means the spin polarization 
energies of free carbon and hydrogen atoms (1.13 eV and 0.90 eV, respectively, 
calculated within LSDA) have been subtracted from the actual atomization en- 
ergies determined by the DF-TB method. The atomization energies as taken 
from [52] are already corrected for zero-point vibrations. For the selected reac- 
tion energies presented here, accurate calculations and experimental values are 
available from [53] and [54]. Comparing with our results, the atomization ener- 
gies are found to be almost excellent, whereas the reaction energies calculated 
self-consistently show a better error cancelation. 
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Table 2. Atomization and reaction energies for some typical reactions of organic chem- 
istry. Atomization energies axe with respect to the free spin-polarized atoms (see text). 
Calculated energies are not corrected for zero-point vibrations, but experimental values 
are extrapolated to zero and corrected for zero-point vibrations. All reference energies 
were taken from [52] (atomization) and [53] (reactions) 



Atomization energies (kcal/mol) 


Molecule 


DF-TB 


HF LSD GGA Exp. 


H2 


113 


84 


113 


105 


109 


CH4 


425 


332 


463 


422 


424 


C 2 H 2 


422 


300 


461 


417 


408 


C 2 H 4 


577 


431 


634 


574 


568 


C 2 H 6 


735 


557 


795 


720 


719 


CeHe 


1438 


1041 


1577 


1413 


1375 


Rms err/bond 


3.5 


27.4 


12.9 


2.5 




Reaction energies (kcal/mol) 


Reaction 


DF-TB 


HF LSD GGA Exp. 


C 2 H 6 + H 2 2 CH 4 


2 


21 


18 


19 


19 


C 2 H 4 -f 2 H 2 2 CH 4 


47 


64 


67 


60 


57 


C 2 H 2 + 3 H 2 -> 2 CH 4 


89 


118 


131 


114 


89 


Rms error 


14.3 


8.6 


16.1 


5.5 




C 2 H 4 + 2 CH 4 ^ 2 C 2 H 6 


43 


22 


32 


22 


20 


C 2 H 2 + 4 CH 4 3 C 2 H 6 


84 


54 


77 


58 


49 


C 3 H 4 + 3 CH 4 2 C 2 H 6 + C 2 H 4 


87 


50 


50 


44 


45 


C 3 H 6 -h 3 CH 4 ^ 3 C 2 H 6 


63 


26 


26 


23 


25 


Rms error 


35.2 


3.7 


15.4 


4.7 





7.3 Solid Crystalline Modifications, Silicon 

During the past few years, there has been progress in the application of pa- 
rameterized TB models to solid-state modifications of silicon [28, 55] and other 
group-IV semiconductors [27, 56]. For comparison, the results of accurate self- 
consistent methods are available for the group-IV equilibrium structures and 
some higher-coordinated high-pressure modifications [57, 58]. Considering the 
results of all empirical TB schemes, no general transferability is obtained for all 
systems. Either the solids are described with high accuracy or small clusters, 
but never both at similar quality as obtained with more sophisticated methods. 

In addressing possible applications of the DF-TB to bulk systems and crys- 
talline surfaces, we have also performed calculations for the total energy as a 
function of nearest-neighbor distance for the diamond silicon lattice, and for 
the high-pressure crystalline phases sc, bcc and fee. The results are displayed 
in Fig. 3 compared to the ab initio data of Yin and Cohen [57, 58]. The equili- 
brium distance of the diamond crystal (2.346 A) is close to the reference value 
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Nearest neighbor distance (Angstrom) 

Fig. 3. Cohesive energies per atom for different lattice types versus nearest-neighbor 
distance obtained with respect to the spin-unpolarized silicon atom. On the left the 
energy values at the equilibrium positions are compared to ab initio values [57] 



(2.360 A) and the experimental result (2.351 A). The energy decrease for the 
simple cubic phase is close to the SCF data whereas the energy differences to 
the other high-pressure phases are overestimated by about 30%. 

As an additional benchmark, we have calculated the vibrational density of 
states of a diamond-Si 2 16- atom supercell which has been plotted in Fig. 4 in 
comparison to inelastic neutron-scattering data obtained on a polycrystalline 
sample [59]. The calculated spectrum has been convoluted by the experimental 
resolution function. We find the most characteristic modes with appropriate 
intensities at the correct wave numbers and the overall width of the vibrational 
spectrum to be in good agreement with the experimental data. 

8 Applications 

8.1 Structure and Stability of Polymerized Ceo 

Since the discovery of the Ceo molecule [60], there has been a growing interest 
in fullerene-based carbon systems. At room temperature, the only weakly in- 
teracting molecules in solid Ceo form a face centered cubic lattice (fee). Recent 
experiments [61, 62] provide convincing evidence that ultraviolet (UV) and visi- 
ble light causes the molecules in solid Ceo to polymerize. The phototransformed 
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Fig. 4. Vibrational density of states for crystalline silicon using a convolution with the 
experimental resolution function {solid line) compared to inelastic neutron scattering 
data [59] {dashed line). The intensities have been scaled to a maximum value of one 



material is no longer soluble in toluene and the associated mass spectra consists 
of integer multiples of the Ceo mass. Further, the characteristic infrared and 
Raman modes of pristine Ceo show significant shifts and splittings when the 
material is phototransformed and new infrared and Raman modes appear in the 
spectra. In particular, the high energy kg{2) pentagonal pinch mode shifts by 
about 10 cm"^ from 1469 cm"^ in the pristine solid to 1459 cm"^ in the photo- 
transformed structure. In addition, a new low-energy Raman peak appears at 
118 cm"^. With respect to the stability of this structure, experiments show that 
the phototransformed material returns to pristine Ceo at approximately 200° C. 
Further analysis of the experimental data leads to the conclusion that the en- 
ergy barrier for the thermal dissociation of the polymerized material is about 
1.25 eV [61]. 

A number of preceding theoretical works [63, 64, 65] have also investigated 
the phenomenon of Ceo dimerization. In these publications, it has been estab- 
lished that the energetically most favorable bonding between two Ceo molecules 
is accomplished by a 66/66 2+2 cycloaddition reaction which leads to nearly 
rigid molecules that are bound together by a four-membered square ring as dis- 
played in Fig.5. However, although empirical and generalized tight-binding (TB) 
methods [63] determine the polymerized structure to be less stable than isolated 
Ceo, the Harris Functional approach [64], Hartree-Fock [63] and an all-electron 
self-consistent LDA calculation [65] come to the opposite conclusion. One major 
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problem is that due to the rather large size of the dimer, accurate self-consistent 
methods relied on partially relaxed geometries. Since the expected energy dif- 
ference will be close to zero, this effect may alter the qualitative result of the 
calculation. As such, it is desirable to derive the dimer binding energy using an 
accurate, self-consistent scheme and geometries that are fully relaxed. 




Fig. 5. Ground-state geometry of the Ceo dimer as calculated with the DF-TB method 



To investigate the problems discussed above, we have applied both DF-TB 
and first principle DF methods. Geometry optimizations were performed by a 
conjugate gradient algorithm using the DF-TB energy functional as described 
in [30]. 

In our calculations we have considered different oligomers of Ceo? among 
them linear structures and a ring formed by four bucky balls. However, the basic 
physics can be understood if one only considers the [C 6 o ]2 dimer. The monomers 
of all polymeric compounds are bound by a 66/66 2+2 cycloaddition as suggested 
in previous studies [63, 64, 65]. For the structures investigated here, it is found 
that the shape of the resulting four-membered rings is almost independent of the 
number of polymeric connections between a particular Ceo monomer and other 
monomers. We determine a length of 1.583 A for the intermolecular bonds and a 
length of 1.590 A for the intramolecular bonds. Further, the bond length between 
the four fourfold coordinated atoms and their threefold coordinated neighbors 
is 1.514 A. These values are in excellent agreement with those of [64] who found 
1.588 A, 1.578 A and 1.511 A, respectively. 

Table 3 displays the cohesive energies of the five polymeric compounds with 
respect to isolated Ceo- Except for minor differences, DF-TB leads to an energy 
gain of about 0.3 eV per polymer bond. This is in accord with the findings of 
Adams et al. [64] who found an energy gain of 0.47 eV per polymer bond for 
the dimer and 0.44 eV per polymer bond for the infinite chain. Based on these 
results, it should be possible to create stable at least two-dimensional polymeric 
solids by photopoly merizing the material. 

While energetic stability is one figure of merit for potential applications of 
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Table 3. Cohesive energies AE and zero-point corrected cohesive energies APJzpc 
for the investigated structures with respect to isolated Ceo as determined by 
the DP-TB, SCP-LDA, and SCP-GGA methods. Note that AE is defined as 
AE=E{iCeo)NyNxE{C6o) 



Structure 


DF-TB 


AE [eV] 

SCF-LDA SCF-GGA 


AS.p. [eV] 
DF-TB 


Cl 20 (dimer) 


0.30 


1.20 0.32 


0.32 


Ci 8 o (lin. chain) 


0.58 




0.61 


Cl 80 (L-shaped) 


0.60 




0.63 


C240 (lin. chain) 


0.86 




0.90 


C240 (square) 


1.20 




1.26 



fullerene assembled materials, the more interesting parameter is the energy that 
is required to break a fullerene polymer apart. Since a reaction barrier is encoun- 
tered when the dimer forms, the observed dissociation energy should be larger 
than the binding energy. An estimate of this barrier for the Ceo dimer dissocia- 
tion has been determined within the DF-TB approach. We have looked at two 
different dissociation pathways for the 120 atom complex. For the first path, 
both interball bonds were fixed in each step at a certain length while the rest of 
the cluster was allowed to relax. We find that the balls dissociate as the inter ball 
bond length becomes larger than 2.16 A. The activation energy for this path is 
1.9 eV. For the second path, we have fixed only one of the two interball bonds 
and allowed the rest of the structure to relax. Finally, when the length of the 
fixed bond exceeds 2.62 A, the other bond breaks spontaneously and the balls 
separate. The activation energy for this path is only 1.6 eV which is in reasonable 
agreement to the experimental estimate of 1.25 eV [61]. Adams et al. [64] found 
an upper limit of 2.4 eV for this energy. Further, the fact that the second bond 
would break spontaneously when the first one is already broken is in accord with 
the results presented in [64] where a single bond connection between two balls 
was found to be unstable. 

Finally, we had a look at the changes in the vibrational density of states 
occurring as a result of polymerization. In excellent agreement with experiments, 
we find the A^(2) pentagonal pinch mode shift downward by 10 cm'^. A look 
at the vibrational behavior of the larger oligomers shows peaks at downward 
shifts by the same magnitude and by 16 cm”^ and 20 cm"^. Further analysis of 
this phenomenon leads to a very simple model: balls with ring connections to 
only one neighbor shift down by 10 cm’^, balls with two connections at an angle 
of 90° (such as the balls in the [C 6 o ]4 square) by 16 cm"^ and those with two 
connections at opposite sides of the balls (such as the ones in the middle of a 
chain) by 20 cm"^ . Consequently, the vibrational behavior in the high-frequency 
region is mainly determined by the local bonding environment of the balls. From 
this fact we can draw the conclusion that the solids on which the experiments 
have been performed consist mainly of dimer-like structures. 
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8.2 Stability of Highly Tetrahedral Amorphous Carbon, ta-C 

Carbon in combination with hydrogen is one of the most promising chemical 
elements for molecular structure design in nature. An almost infinite richness 
of possible structures with a wide variety of physical properties can be pro- 
duced. Even the two crystalline inorganic modifications, graphite and diamond, 
show diametrically opposite physical properties. Whereas graphite with is typi- 
cal layered sp^ bonding planes is black, soft, lubricating, electrically conducting 
and adsorbing, the fully three-dimensionally connected sp^ hybridized brightly 
sparkling diamond is light, hard, brittle, electrically insulating and transparent. 
Graphite behaves as a semimetal due to a nonvanishing density of states (DOS) 
near the Fermi energy produced by delocalized 7r-states. In contrast, a large a- 
band gap makes diamond behave like an insulator, which is also known as one 
of the best heat conductors. 

Successful handling of different deposition methods allows the preparation 
of metastable amorphous carbon structures with interesting physical proper- 
ties between diamond and graphite [66]. “Tetrahedral amorphous carbon” {ta- 
C) [67], which was recently grown by McKenzie et al. using ion-beam techniques, 
is promising for future technological applications due to its diamond-like prop- 
erties. The material is extremely hard, chemically inert, develops large band 
gaps [68] and may be n-type doped by phosphorus [69] and nitrogen [68]. 

The growing interest in a fundamental investigation of structure-property 
relations and mechanisms of structure formation in carbon-based amorphous 
materials has lead to recent applications of molecular-dynamical simulations [25, 
70, 71, 72]. As the result of our simulations, we have obtained final metastable 
amorphous carbon modifications at different mass densities. The relaxation of 
128 atoms containing supercell structures has been realized by applying a “rapid” 
dynamical cooling of a partly equilibrated liquid at a cooling rate lO^^K/s over 2 
ps under constant volume conditions. In all finally obtained structures there is a 
clear tendency for the different hybrids to separate from each other and to form 
small interconnected subclusters. Owing to the fixed composition and constant 
atom number in the super cells, the cohesive energies at different densities have 
been compared to determine 3.0 g/cm^ as a magic density at which the most 
stable amorphous carbon modifications are formed. In Fig. 6 we have plotted 
the cohesive energies of all a-C models versus density and as a reference we 
note that the diamond cohesive energy per atom within the present DF-TB 
scheme is —8.02 eV. For a comprehensive description of how the structures, the 
chemical bonding properties, and the global band gap properties change with 
the simulation regime, we refer the reader to [72]. Here we only focus on some 
important facts. For illustration, we show in Fig. 7 an image of the structure and 
in Fig. 8 the electronic density of states (DOS), completed by the vibrational 
density of states (VDOS). The latter spectra has been split into the different 
hybrid contributions. 

The results confirm the stability of high-density tetrahedral amorphous 
carbon ta-C [68], which has been deposited in different laboratories by var- 
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ious techniques. This is also supported independently by another ab initio 
MD method [25]. At this density, the internal strain is maximally reduced from 
the network by the separation of small, favorably even-membered tt clusters be- 
tween undercoordinated sites yielding minimal defect concentration. The fraction 
of fourfold coordinated atoms reaches about two thirds, which is very close to 
the experimentally obtained values. The increase in the tt-tt* splitting up to 
3.0 eV compared to low-density materials is mainly determined by the changing 
size distribution of tt clusters with respect to smaller ones and by the ability of 
the 7 T bonds to relax to a mean value for the p-p- 7 r overlap of 0.6 - 0.7 times 
the overlap in C 2 H 4 . There is a balance between the gain of 7 r-bonding energy 
due to sp^ clustering and the residual stress in the amorphous network, favoring 
the high stability at the considered density. 

Comparing the simulated diffraction data of our 3.0 g/cm^ model in both 
momentum and real space with neutron-diffraction experiments, the agreement 
is very good. Additionally, we have calculated the vibrational spectra to allow for 
further comparison with Raman- and infrared-spectroscopical data. For the total 
VDOS of our magic density model we find a characteristic half-sphere shape in 
a frequency range u between 250 and 1500 cm“^ in complete support of similar- 
shaped spectra obtained by Drabold et al. [25] and Wang et al. [71] using differ- 
ent methods. The most surprising property of the spectra is the complete loss of 
all reminiscence of the split graphite and diamond behavior. This is a remarkable 
difference to the comparable situation of a-Si near the crystalline density, where 
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^ sp3-bonded 



carbon atom 



Q sp2-bonded carbon atom 



Fig. 7. Structural image of the 3.0 g/cm^ model 



the VDOS shows a typical diamond-lattice vibrational behavior. Responsibility 
for this different situation might lie with the dominating 7r-interaction in carbon 
even at the considerable density of 3.0 g/cm^. The small band features above 
1500 cm"^ are due to strongly localized stretch-type modes caused by embedding 
of undercoordinated sp^-units (pairs, . . . ) in a rigid sp^ bonding environment. 



8.3 Diamond Surface Reconstructions 

An atomistic understanding of growth-related properties of various diamond sur- 
faces is becoming important in order to achieve an optimization of deposition 
conditions and the control of the surface chemistry for the production of high- 
quality diamond films. Regarding this, the combination of surface-sensitive ex- 
perimental techniques such as high resolution electron-energy-loss spectroscopy 
(HREELS) [73, 74], scanning tunneling microscopy (STM) [75] and theoretical 
modeling [76, 77, 78, 79, 80, 81] becomes particularly helpM to develop ideas 
about growth mechanisms at various metastable surface modifications. 
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Fig. 8. Electronic and vibrational data for the high-density ta-C model, (a) Total {solid 
line) electronic density of states (DOS) split into orbital contributions (s-DOS dotted 
line, p-DOS dashed line), (b) Total {solid line) and hybrid- fractional (sp^ dotted line, 
sp^ dashed line) vibrational density of states (VDOS) 



For reasons of limited space we only focus on some important results about 
stability and vibrational properties of diamond (111) reconstructions. For more 
detailed information about the (100) and (111) diamond surface models we refer 
the reader to [79, 82, 83]. More recent studies about growth mechanisms forced 
by CHa; -radical adsorption are discussed elsewhere [84, 85]. 

After termination of growth, three hydrogen terminated (111) surface re- 
constructions, shown in Fig. 9 with their related surface vibrational densities of 
states (SVDOS), are found to be stable on (111) facets. A typical observation of 
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hydrogen terminated ( 1 x 1 ) bulk-like structures on as-grown (111) facets by STM 
is confirmed by the C(lll)(l x 1 ):H surface model. In the absence of hydrogen, 
the ( 2 x 1 ) reconstructed 7 r-bonded Pandey- chain model has been established as 
the most stable configuration [76, 79], giving an energy gain of 0.7 eV/surface 
atom relative to the unreconstructed bulk situation. Hydrogenating the chains 
maintains the ( 2 x 1 ) Pandey-chain reconstruction, now forming a metastable 
surface state shown in the C(lll )(2 x 1 )PC:H model. This geometry is less stable 
than the hydrogenated bulk C(lll)(l x 1 ):H structmre by 0.6 eV/surface atom. 
In recent MD adsorption studies of CH 3 radicals onto (111) diamond, another 
stable surface layer reconstruction has been found. Whereas a (1 x 1) adsorption 
of CH 3 radicals on the clean ( 111 ) diamond surface will lead to steric interference 
between the hydrogens [ 86 ], an alternating adsorption of CH 3 at every second 
surface site as discussed by Sasaki and Kawarada [87] is found to be highly sta- 
ble. In this case, the adsorbed CH 3 species may form a (2 x 2) reconstruction, 
compare Fig. 9. Considering this surface configuration, the subsurface C atoms 
that bind the adsorbed CH 3 radicals are sp^ bonded, whereas the remaining 
C atoms of the subsurface have one dangling bond favoring a behavior towards 
local graphitization. If these dangling bonds are saturated by hydrogen, as shown 
in the C(lll)(2 x 2):CHs, the diamond surface is again completely stabilized. 
The theoretical confirmation of various hydrogenated reconstruction types that 
were observed in experiments on as-grown ( 111 ) facets may provide important 
information about successive growth steps. 

For reasons of comparison with recent HREELS studies of the ( 111 ) diamond 
surface we have calculated and discussed the VDOS of various surface reconstruc- 
tions [83]. In the following, we only briefly list some of the most characteristic 
properties. 



C(lll)(lXl):H The distinct and sharp peak centered at about 3000 cm"^ 
is exclusively due to C-H stretching vibrations. Towards lower frequencies, the 
band region of a; ~ 1000-1500 cm"^ is occupied by various types of C-H bending 
modes. Most of them are coupled to sublattice excitations. Within a lower fre- 
quency band region, 500-900 cm"^ , we additionally assign various (C-H) complex 
vibrations. We find pure translational motions of the (C-H) complexes parallel 
to the surface as well as pure bouncing vibrations. All theoretically obtained 
features may be related to their measured counterparts from HREELS experi- 
ments [73]. 

C(111)(2x2):CH3 Comparing with the surface VDOS of the H-terminated 
C(lll) — (Ixl)we find two characteristic differences. Instead of an intense and 
sharp peak in the high-frequency region, the feature located around 3000 cm"^ 
is clearly split into two peaks centered at 2990 and 3060 cm"^ , respectively. As 
origin for the two distinct features we determine the C-H stretching vibrations of 
the CH 3 adsorbates. While asymmetric C-H stretching of the CH 3 groups occur 
at the highest frequencies around 3060 cm‘^, the symmetric stretching modes 
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Fig. 9. Surface projected vibrational density of states (SVDOS) and structural images 
of the C(lll)(l X 1):H, C(lll)(2 x 1)PC:H and C(lll)(2 x 2):CH3 reconstruction 
models 
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cause the lower-frequency peak at 2990 cm“^ . Another interesting feature caused 
by very soft translational modes of the entire CH3 radical parallel to the surface 
is visible at about 250 cm"^. 

The results are in good agreement with recent HREELS studies of epitaxially 
grown diamond (111) by Aizawa et al. [88]. These authors conclude from their 
off-specular HREELS spectra of as-grown (111) diamond that the high-frequency 
peak splitting is clear evidence for the presence of CH3 groups. Additionally, they 
report an acoustic surface mode in the lower-frequency region which may be 
discussed in correlation to our CH3 translational modes parallel to the surface. 
On the basis of the experimental data, the authors have predicted a coexistence 
of C-CH3 and C-H on the surface. The appearance of the acoustic phonon mode 
may serve as an important indication for possible high symmetry ordering of the 
adsorbed CH3, as predicted by the C(lll)(2x2):CH3, 



C(lll)(2xl)PC:H The existence of a-bonded chains in the top surface layer 
are responsible for most of the observable phonon modes. The single distinct 
peak centered at 2970 cm"^ is due to symmetric and asymmetric C-H stretching 
and compressing modes, in very good agreement to experiments and theoretical 
predictions [81, 89, 90]. The broad band region u ~ 1200-1500 cm'^ is dom- 
inated by hindered rotational vibrations of C-H complexes within the chains. 
Towards lower frequencies, we find a transition from rocking- and bending- to 
scissoring-like behavior. The most intense modes dominating this band, e.g., the 
peak maximum at 1318 cm"^, are due to C-H complex bending vibrations. For 
example, a symmetric in-phase shearing translation of C-H complexes within 
the chains may be responsible for the experimentally observed peaks between 
1240 and 1300 cm"^, see [89, 90, 91]. The most significant modes belonging to 
the lower frequencies, u; < 1200 cm"^, again represent vibrations in which the 
C-H complexes of the chains move as a whole. 

9 Summary 

Throughout this review we have described basic principles for total energy and 
interatomic force calculations in molecular-dynamics simulations of structure 
formation of real materials. Discussing the advantages and shortcomings in the 
use of classical empirical potentials and fully self-consistent density-functional 
calculations, we outline a third “hybrid” method - a density-functional based 
tight-binding molecular dynamics, which combines the efficiency of empirical 
concepts with the accuracy of scf calculations. 

Considering the simple and straightforward ab initio concept for the con- 
struction of the electronic Hamiltonian within a nonorthogonal two-center ap- 
proach, all tests and applications on various-scale homo- and heteronuclear sys- 
tems confirm a transferability which is almost comparable to scf-LDA calcu- 
lations. This opens the possibility for performing highly predictive computer 
simulations in molecular and solid structure design. 
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Abstract. The description of the finite element method (FEM) for the Stokes problem 
is considered. Boundary conditions that are important in fiuid mechanics are discussed. 
The condition of incompressibility leads to a saddle-point problem. The approximation 
by a mixed finite element method requires the choice of suitable finite elements. Oth- 
erwise the computation suffers from instability and useless results are produced. 



1 Introduction 



In fluid mechanics the basic conservation equations of mass, momentum, and 
energy are usually made tractable by neglecting terms that are assumed to be 
small. In this way examples of the standard types of partial differential equa- 
tions, i.e., elliptic, parabolic, and hyperbolic, are obtained. These three types of 
differential equations have different characteristics. Hence different initial and/or 
boundary conditions are necessary to formulate a well-posed problem. 

An analytical solution can be found in only very few cases, even for sim- 
plified physical problems. The application of different numerical approximation 
methods has therefore a long tradition in fluid mechanics. 

In recent years the finite element method has found an increasing use and a 
wider acceptance for the solution of the equations governing viscous incompress- 
ible fluid flows. The advantages are a well-established mathematical foundation, 
great geometrical flexibility, and the possibility to design general computer im- 
plementations. Nowadays, there are computer codes having the capability of 
simulating complex industrial processes. On the other hand the FEM applica- 
tions in fluid mechanics are subject to further research investigations. 

The different types of partial differential equations have special properties, 
which have to be considered in the numerical treatment. Here we consider the 
elliptic Stokes equation, which plays a fundamental role in approximation prob- 
lems in the field of fluid mechanics. Furthermore, the theory of FEM for this 
problem is very well developed. In particular, the problem of the condition of 
incompressibility has been studied. 

In the following, we shall give an outline of FEM for the Stokes problem. A 
strongly mathematical study can be found in the literature. We especially refer 
to the comprehensive books of [1, 4, 5, 8]. 
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2 Stokes Equation 



2.1 Conservation Equations 



The domain Q is occupied by an incompressible fluid with density p, and we 
consider a stationary process. If we denote the velocity by u, the conservation 
of mass gives 

V-M = 0, (1) 

and when we neglect convective terms (creeping flow) the conservation of mo- 
mentum reads 

- V ■ g = p/ . (2) 

There / is an external force on a unit volume and a denotes the total or Cauchy 
stress tensor. This tensor can be split into a pressure term p and the extra stress 
tensor r due to viscosity, 

dij = P^ij • ( 3 ) 

The Newtonian constitutive equation relates the stresses with the viscosity tj by 



Tij =^V + Uj,i) 



(4) 



The abbreviation Uij = dui/dxj denotes the partial derivative of U{ with respect 
to the coordinate xj. From (3), (1), and (2) we derive the Stokes equation 



-rjAu-hV-P = pf _ , 



(5) 



where the symbols V and A denote the classical grad and Laplacian operators. 
In (5) the pressure is determined only up to an additive constant. Usually, in 
the theory it is normalized by 




(6) 



in practice p is prescribed in one point. 



2.2 Function Spaces and Variational Formulation 

In the FEM the fundamental relation is the variational or weak formulation of 
the differential problem. The unknown functions are looked for in some function 
spaces and the equations are (scalar) multiplied by some test functions. 

Let J? be a bounded open set in IR^, n = 2 or 3, with a sufficiently regular 
boundary T. Usually we denote by 

H^{Q) = {qeL'^{n):T>qeL‘^{n)] (7) 

the space of functions with integrable derivatives of order A: == 0, 1 on i?. Here 
L^(i?) is the space of square integrable functions in i? and Dg symbolizes a 
(generalized) derivative of g. Consider the boundary- value problem 

—T] Au + Vp = p/ , V • u = 0 , u|r = g , with J g • ndT = 0 , (8) 
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where the function g describes the velocity on the boundary F. We look for u in 



Vg = {ve {H\0)r: V-v = 0,v\r=^g} 



and p in 



H/2): f 

JQ 



qdx = 0 



The last condition for g in (8) stems from application of the Gauss theorem 
to the continuity equation, with n outer normal on F: 

0 = J V-wdx = J u-ndr = j g-ndF . (11) 

This is a necessary condition for the boundary velocities for the existence of a 
divergencefree velocity field in i?. 

Multiplying the first equation in (8) by the test functions veVo, defined by 
(9) with £ = 0, and applying the Gauss theorem, we obtain the weak formulation 

J {-WiJjd-p,i~pfi)Vi = y J + {P^i) ,i + pUijVij -pVi^i- pfiVi) dx 

= J^{-'nuijrijVi -hpvini) di" + y iWijVij - pvi^i ~ pfiVi) dx 

= J {rjUijVij - pfiVi)dx = 0 . ( 12 ) 

Here, and subsequently, we employ the usual summation convention, i.e., if an 
index is repeated, summation about this index is implied. The surface integral 
is zero due to homogeneous boundary values and the pressure term vanishes 
because the functions v_£Vq are divergence free. Given 



a{u,v) ~V J UijVijdx , (l,v) := j /, 



'iVi dx , 



the weak formulation of (8) reads: 

find ueVg, with a{u,v) = p{fjv) Vu € ko . (14) 

If % ^ ^ is a known extension of the boundary function p to i?, we have 
M = Mo + MpjMo With the right-hand side 

l{v) = p{/, V) - a{Ug,v) , (15) 

problem (14) reads as a problem with homogeneous boundary conditions: 

find Uq£Vq , with o(uo,u) = l{v) Vu e Vb . (16) 

The form a{u,v) is bilinear and symmetrical. Therefore the problem is equivalent 
to the variational problem 



J{v) ■■= xa(v,v) - l(v) — inf . 

Z Vo 
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This will be seen by writing the Euler equations, i.e., for vanishing variation of 

Jiv), 

^J{n)\u •= liin j{J{u + tv) — J(m)} = 0 j (18) 

we find (16). This equivalence gives the name ‘variational formulation’ to (16). 
More mathematical details can be found in the standard literature cited above. 

We note that such a relation to a variational problem exists only in the case 
of a bilinear and symmetrical functional a( , ), i.e., the differential equation 

problem has to be linear and symmetrical. This is not fulfilled for example for 
the Navier Stokes equations due to the convective terms, which are nonlinear. 

However, the derivation of the weak formulation is always possible. The name 
results from the fact that, in general, the solution of (17) or (16) is not sufficiently 
regular to satisfy (8) in the classical sense (pointwise). 

2.3 Saddle Point Problem 

We have assumed the functions v^Vq are (pointwise) divergence free, V - u = 0. 
This can be weakened: 

^(l^?^) f^‘vqdx = 0, Vg E Lo(i7) . (19) 



Replacing Vg by 

Vs = e (i?' mr ■ b{v, q) = 0, Vg e v\r = g} (20) 

and repeating the procedure concerning (12) with u E we find the results (13) 
- (17), i.e., the variational problem 

J(£) := — l{v) — > inf . (21) 

2 Vo 

Substituting Vq by its definition, we can rewrite this as a constrained minimisa- 
tion problem: 

— Kid — ^ ? with b{v,q) = 0 , Vg E LI(0) . (22) 

2 Xo 

Here 

Xo = fe € : v\r = 0} . (23) 

Using the Lagrangian multiplier, we obtain a minimisation problem without 
constraints: 

JC.{v,q) := J{v) + b{v,q) = l;a{v,v) - l{v) + b{v,q) — inf sup . (24) 

2 VeXo 

We remark that the usual Lagrangian multiplier A is imphed in the function 
q. Such a problem is called a saddle point problem because the solution is a 
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minimum with respect to v and a maximum with respect to q. For vanishing 
variation, 



9)I(3£,p) — 1™ ~ = 0 . (25) 

we find 

a{u, v) + b{v,p) =l{v) , VveXo , , , 

b{u,q)=0, VqeLl. 

It can be seen that the pressure p plays the role of the Lagrangian multiplier of 
the constraint that the velocity has to be divergence free. Sommerfeld obtained 
this relation by applying the Hamilton method [9]. 

The first component of the solution {u,p) of (26) is always a solution of (21), 
but the unicity of the Lagrangian parameter in (24) or (26), i.e., the equiva- 
lence to (22) , can be guaranteed only if the following inf-sup or LBB condition 
(Ladyshenskaja-Babuska-Brezzi) is satisfied: 



inf sup 
y^Xo lllZ 



q) 



>/3>0. 



(27) 



Here ||.|| denotes the norm in the spaces Xq and Lq, respectively. This condition 
is fulfilled by the continuous Stokes problem but is not automatically satisfied by 
the discretized equations. It is a condition for the approximation of the spaces 
Xq and Lq, which cannot be arbitrarily chosen. 

It is interesting to note that the problem (26) will also be obtained if we 
multiply the first equation in (8) by v e Xq and the second one by q E Lq. In 
such a simple manner, FEM had been realized long before the relation to the 
saddle point problem and the inf-sup condition were revealed. 



2.4 General Boundary Conditions 



In fiuid mechanics we often find boundary conditions that are more compficated 
than Dirichlet conditions for the velocity assumed above. To solve the Stokes 
problem with general boundary conditions we have to reconsider (2). Let us 
multiply this equation by an arbitrary but sufficiently regular function v: 



j = J ~ PfiVi) dx 

- J -crijTijVidr + J {cTij^(vij +Vj^i) - pfiVi)dx . (28) 



In manipulating the second term on the right-hand side, we have used the sym- 
metry of the stress tensor aij. For a Newtonian fluid we obtain, with (3) and 

(4), 



j + Vj^i) da; = y + Uj,i){vij + Vj^i) da; - j pvi^i dx 



(29) 
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In the surface integral in (28) we find the stress vector g_ = g_'n. This vector can 
be split into a normal and a tangential component: ^ = cTn n + at t . This holds 
in In IR^ there are two tangential vectors Thus, we have 

J GijTijVi dF = J (an R + at t) * vdF . (30) 

We now can formulate the boundary conditions that allow the computation of 
this integral [7]: 

1. The Dirichlet conditions on Fq sire 

Fo : u = g. (31) 

The test function v has to vanish on Fq , Fq : t; = 0 . The line integral on Fq 
is then zero. 

2. On the boundary Ft the tangential component of the velocity is known, 

Ft • U.' i — 9t • (^^) 

Then the tangential component of v on Ft has to vanish on Ft : V't = 0. 
Knowing the line integral on Ft, we find the natural boundary condition 

Ft : an = (^ • n) • n = — p + (r • r) • ZI = a^ , (33) 

i.e., the normal component of the stress vector a is prescribed. This bound- 
ary condition is useful to model entry and exit flows. 

The equation (33) shows that in this case the pressure is completely deter- 
mined. 

3. On the boundary F„ the normal component of the velocity is known, 

Fn • U' 71 = Qyi . (^4) 

The normal component of v has to be zero on Fn, Fn : V'n = 0. The natural 
boundary condition is then 

Fn : at = (^ • n) •£=(£• n) • t = af . (35) 

This can be used to formulate a line of symmetry. 

4. On Ff both an and at are prescribed. Then the line integral on Ff is known 
without any assumption for v. This formulation is useful to model a free 
surface. 

The notion of a natural boundary condition stems from the fact that the trial 
functions do not fulfill these conditions. However, the weak solution satisfies 
them (in a weak sense) in a natural way. 

The definite form of the natural boundary conditions depends on the formu- 
lation of the differential operator. Starting with (5) we arrive at other natural 
boundary conditions which do not have a simple physical interpretation. 
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2.5 Example 

We illustrate the boundary conditions and the weak formulation by considering 
a branching flow (Fig.l). 

The line is a line of symmetry: 

Fn : u - n = Q , cTt = 0 . (36) 

Across the boundaries Fkt,k = 1,2, there is an exit flow in the normal direction: 

At : M ■ £ = 0 , cTn = -p+ iz-Ik) 'R= -pI . (37) 

The prescribed entry flow and the no-slip condition belong to Jo with Dirichlet 
conditions for u : 

Fo: u = g. (38) 




Fig. 1. Different kinds of boundary conditions 



In a “nearly developed” flow in a channel or tube the velocity in the cross-stream 
direction is near to zero, hence the term (r • n) • n is small and (37) is reduced 
to a condition for the pressure. 

The weak formulation reads as follows: 



find ueVg = ve : -To ; n = £, : n • n = 0, A = At U T 2 t : w ■ f = 0 

and p E L^{f2) with 

J d" A’,*) d“ 

vEVq, - f qui^i da; = 0, Vg G L^. 



dx 



= J PfiVi dr , 



( 39 ) 
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3 Discretization 

3.1 General Formulation 

We obtain the FEM discretization by replacing the infinite-dimensional spaces of 
the continuous functions by finite-dimensional subspaces of “simple” functions. 
In the Galerkin method for the problem (16) the space Vq is replaced by the N- 
dimensional space Voh = {v^-, ■ • ■ , spanned by the basis , fc = 1, . . . , iV : 

find Uf^ e Voh with = l{Vf^) VtL,, G Voh . (40) 

The coefficients of the unknown function 

N 

can be found as a solution of the matrix problem: 





N 

i=i 


i,j = 1,... 




(42) 


or 


AU = G, A,j=a(vi,vi), 


Gi = 1{v!h) , 


Uj = ui . 


(43) 



The functions used to approximate u are called trial or form functions. To obtain 
the weak formulation the equations are multiplied by test functions. 

The Galerkin method uses trial and test functions that are identical. If the 
trial and test spaces are approximated by different function sets, the algorithm 
is called the Petrov-Galerkin method or the method of weighted residuals. 

The matrix A is called the system matrix or stiffness matrix in the mechanics 
of solids. 

Another approach is the substitution of (41) into the variational problem 
(17) and the forming of the Euler equations: 

(44) 

This algorithm is called the Ritz method. As a result we obtain the system (43) 
again. 

If we want to use this formalism in the variational formulation (17) or (21) 
of the Stokes equation we have to have functions that are divergence free in i? 
in the strong sense, G Vb, or in the weak sense, G Fq- Moreover these 
functions have to form a basis. It is not easy to construct such functions and 
even if we succeed the functions often do not have a simple form and are not 
very suitable for computations. 
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Simpler functions can be employed if we start with the saddle-point formu- 
lation (26): 

v^eXo = {v^ e (HHn)r ■■ Uhlr = 0 } , % g l^q) . 

But these functions have to satisfy the inf-sup condition. 

Let Xoh , Qh be an approximation for Xq , Lq. The saddle point problem 
reads 

find M/i € Xoh, Ph e Qh 

with a{uf^,Vh) + KUhiPh) = l{vh)y G Xqh , (45) 

HPLhiQh) = 0 , ^Qh e Qh . 

With the basis . - . , of Xoh and {q \, . . . , of we have 

N M 

^h = Yl > Ph = Ylp^qi 

j=l j=l 

and we obtain the following matrix problem, which is characteristic of the ap- 
proximation of saddle-point problems, 



with Aij = a(vi,vi), Bij = Uj = u^, Pj = iP , i = 1, . . . ,N, j = 

1, . . . ,M. In the FEM this formalism is called the mixed method. The system 



matrix A = 




is sparse, symmetrical, but not positive definite. In small 



and medium-sized problems the solution of (46) can be realized by a direct 
Gauss algorithm taking advantage of the sparsity of A. In large sized problems, 
e.g., 3-dimensional one, we have to use specific iteration procedures. Suitable 
algorithms are penalty methods, preconditioned conjugate gradient (eg) methods 
and multigrid methods [1, 2]. 

We emphasize that the usually used iteration procedures are not appropriate 
for this indefinite problem. 



3.2 Finite Elements for Saddle-Point Problems 

The key requirement is the satisfaction of the inf-sup stability condition, which 
involves both velocity and pressure spaces. This condition is one of the most 
celebrated results in the mathematical theory of finite elements. 

Equal-order interpolations of velocity and pressure ph fail to satisfy the 
LBB condition. Different pathological constellations can occur, e.g., if there are 
too many constraints due to incompressibility only the trivial solution Uj^ = 0 
exists (locking effect), or the constraints are not independent and the pressure 
is not uniquely defined (checkerboard instability). By adding suitable functions 
foi" Uh (bubble functions) stable elements can be designed. In this way we obtain 
the mini-element. 
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Mini-element (Pi + bubble - Pi). 

Let Tk be a triangle (see Fig. 2 ) and Pi 
denote the set of polynomials of first de- 
gree onTk^Pi = { 1 , , 3^2}. In the unit 

triangle the barycentric coordinates are 
given by 

Ai = xi , 

A2 = 0^2 , ( 47 ) 

A3 = 1 - - 3^2 . 




Fig. 2. Unit triangle and the position 
of the degrees of freedom of Uh 1 Ph 



The “bubble function” B{x) = A1A2A3 vanishes on the sides of the triangle. The 
trial functions for Uu are the bilinear functions from Pi combined with bubbles 
B{x): 

Xh = {Vh e e (Pi © span{P(x )})2 on Tfc} , , . 

Qh = {gh e qh e Pi on Tk} . ^ 

This approximation fulfills the inf-sup condition. 



Taylor-Hood Element (P 2 - Pi). On a triangle (see Fig, 3 ) we take a 
quadratic approximation of G (P2)^ = {1,2:1,3:2,3^13:2, 3:1, 3:2}^ and a linear 
one of Ph ^ Pi- 

Xh = {vh e Uh e (P2)" on Tk} , , . 

Qh = Qh e Pi on Tk} . 

The inf-sup condition is satisfied and the element is approved. We obtain a 
variant of this approximation by subdividing the triangle into four subtriangles 
and interpolating the velocity by linear functions on each subelement. This ele- 
ment can also be used with rectangles if the polynomials Pi , P2 are replaced by 
Qi = {1,3:1, 3:2, xi 3:2} ,02 = < 3 i U {3:^,3:2,3:13:2,3:13:2,3:13:2}. Such an approxi- 
mation works also on tetrahedrons and bricks in R^. 




Fig. 3. Nodes for the Taylor-Hood element 

Pfe+i + bubble - Pfe discontinuous. In these elements (see Fig. 4 ) the pres- 
sure is discontinuous on the sides and the nodes are located at points that are 
used for the Gauss integration. 

Xh = {v^ e Uk e {Pk+i © span{P(x )})2 onTj} , 

Qh = {qh £ 1 qh £ Pk on Tj} . 



fc = 0, 1 ( 50 ) 
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Fig. 4. Nodes with discontinuous pressure 



Qi — Pq Element. In the Stokes equation Au and Vp are in balance. Therefore, 
one could expect an interpolation for p one degree less than for u will work well. 
Considering the element Qi — Pq, we see that this assumption does not give 
automatically stable approximations. Let Tk be a rectangle. We approximate 
the velocity Uf^ using functions in Qi = {l,^i,X 2 ,xiX 2 }, Uh ^ Qi l^he 
pressure pu using piecewise constant functions, p E Pq, on 

e e (Qi)2 onT,} , 

Qh = {qh&L-^W, qhEPo onTk}. 

The inf-sup condition does not hold and the pressure approximations are unsta- 
ble (checkerboard instability). The element can be stabilized by adding suitable 
functions for defined on macroelements. 



The elements considered above are conforming elements because the trial 
functions also belong to the spaces of the continuous functions, 



Vf^ € Xq, Qh G or Xoh C Xq, Qh C . (52) 



Due to the definition (23) of Xq the conti- 
nuity of the velocity Vf^ is required. 

It can be shown that there are approxima- 
tions that do not satisfy this condition but 
work very well. These methods are called 
nonconforming. The triangle element with 
piecewise linear velocities discontinuous 
on the sides belongs to this class. 




Fig. 5. Nodes of the 
Crouzeix-Raviart element 



Crouzeix— Raviart Element (Pi discontinuous — Pq discontinuous) . (See 
Fig. 5) 



on Tk, continuous in the 1 

^ \ middle of the sides J ’ 

Qh = {qh e qh ePo on T*} . 



( 53 ) 
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The functions G Xh are continuous only at the nodes located at the middle of 
the sides. It is easy to construct a basis with weak zero divergence, b{vf^,qh) = 
0, Vg/i G Qh‘ Hence it is not necessary to use the saddle-point problem (26) but 
we can directly apply the variational formulation (21). 

The precision of the element is not high (order of approximation 0(h)). A 
similar construction is possible on tetrahedrons in 3D. 

4 Final Remarks 

In the FEM, important questions are related to the generation of the finite- 
element mesh, the assembling of the system matrix including a numerical inte- 
gration algorithm, and the solution procedure of the sparse but large system of 
the discrete equations; see [1, 6]. 

The saddle-point formulation (or mixed method) gives stable solutions only 
if the combination of the velocity-pressure approximation fulfills the inf-sup 
condition. In 2D some methods are known to prove this inequality, but in 3D it 
represents a great challenge. 

Therefore a modified weak formulation is seeked in order to use elements that 
do not satisfy the inf-sup condition. This leads to the least-squares method and 
Petrov-Galerkin method. For this current research refer to [3, 10]. 
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Abstract. In this paper we briefly outline some principles of parallel architectures 
and discuss several impacts on their programming models. First, parallel computers 
are generally classified. A description of the most important classes - multiprocessors 
and mcissively parallel systems - follows, with some details about chosen machines. 
The corresponding programming models for shared-memory and distributed-memory 
architectures are introduced. The special relationship between machine architecture 
and efficient parallel programming is emphasized here. The paper concludes with some 
hints for the software developer about where to use which parallel programming model. 



1 Introduction 

The last ten years have seen the employment of parallel computers for the so- 
lution of complex scientific, mathematical, and technical problems, with their 
developing into a key technology. The paradigm shift towards parallelism has 
led to changes on all levels, from machine hardware to application programs. A 
broad spectrum of parallel architectures has been developed. 

In general, a parallel algorithm can be efficiently implemented only if it is 
designed for the specific needs of the architecture. Thus the knowledge of primary 
computer design principles is of course relevant for software developers as well 
as numerical analysts in the field of computational physics. This fact is often 
underestimated by software developers. 

For this reason, in the following we present a brief introduction to basic 
architectures of parallel computers. 

2 Overview on Architecture Principles 

Before the development of “vector computers” in the 1970s, so-called “main- 
frames” were also used for scientific computing although they have typically 
been the workhorses of data-processing departments. 

The first supercomputer architectures involved the use of one - or, at most, a 
few - of the fastest processors that could be obtained by increasing the packing 
density, minimizing switching time, heavily pipelining the system, and employing 
vector processing techniques, which apply a small set of program instructions 
repeatedly to multiple data elements. 
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Table 1. (Vector) supercomputer performance [5] 



Computer type 


MFLOP S/Processor Clock Rate \ps] 


Cray Y-MP 


300 


6.0 


Cray C90 


952 


4.2 


NEC SX 3/14R 


6400 


2.5 


Fujitsu VP 2600/10 


5000 


3.2 


Hitachi S-3800/180 


8000 


2.0 



Vector processing has proven to be highly effective for certain numerically 
intensive applications, but much less so for more commercial uses such as online 
transaction processing or databases. 

In fact the sheer computational speed (Tab. l)was achieved at substantial 
costs, namely by sophisticated highly specialized architectural hardware design 
and the renunciation of such techniques as virtual memory (to facilitate the 
programmability). In particular, the last fact has led to the development of a 
considerable body of specialized program code. 

Another way that respects conventional programmability has led to the de- 
sign of so-called multiprocessor systems (MPS). Only small changes to earlier 
uniprocessor systems had to be made by adding a number of processor ele- 
ments (PEs) of the same type to multiply the performance of a single processor 
machine. Although there were effects on the programming model, at least the 
essential fact of a unified global memory could be maintained. 

Further developments discarded the demands on a unified global memory 
because of the impossibility of its physical realization where hundreds and thou- 
sands of processors are used. The total memory is distributed over the total 
number of processors; each one having a fraction in the form of a local memory. 

In the 1980s the first massively parallel processors (MPP) began to appear, 
with the single goal of achieving far greater computational power than with 
vector computers at greatly improved price/performance ratios by using low- 
cost standard processors. 

A still essentially unsolved problem for the use of such systems is the devel- 
opment of appropriate programming models. No standard programming model 
that satisfies the needs of all applications has yet been found although a variety of 
competing models have been developed, including message passing, data-parallel 
programming, and the virtual shared-memory concept. However, the efficient use 
of parallel computers with distributed memory requires the exploitation of data 
locality, which can indeed be found in most important numerical applications. 

Because it is easier to bring activities onto established architectures than to 
do so on parallel machines, high-performance workstations are often still prefered 
for program implementations. If the performance needs increase then a cluster of 
interconnected workstations (WSC) can also be considered as a parallel machine. 
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But typically the interconnection network of such clusters is characterized by 
relatively small bandwidths (some MBytes/s for 1 KByte messages) and high 
latency^ (in the range of milliseconds for 1 KByte messages). Thus suitable 
applications are of a competitive rather than cooperative type (with naturally 
high communication requirements). 

Nowadays we realize that all the mentioned types - MPS, MPP, and WSC - 
as well as advanced types of vector computers (multivector computers) are inte- 
grated in a network environment and can be combined to form a heterogeneous 
supercomputer. The recent development of a message-passing interface (MPI) is 
a landmark achievement in making such systems programmable. 

Summarizing we note that each architecture has its strong and weak points 
and it will take continuous improvement to overcome its drawbacks. Parallel 
computer development is currently heavily influenced by the technological capa- 
bilities. As a consequence we notice a trend to massive parallel arrangements of 
symmetric multiprocessor systems, which we call MPP/SMP. 




MISD MIMD 




Fig. 1. Flynn’s classification of computer architectures 



3 General Classification 

Michael Flynn [6] introduced a classification of various computer architectures 
based on notions of instructions (I streams) and data streams (D streams) (Fig. 
1). Conventional sequential machines with one processing element (PE) are called 



^ Latency is the total amount of time it takes for the sender to pack the message and 
send it to the receiver, and for the receiver to receive the message and copy (unpack) 
it into its own buffer. 




344 Wolfgang Rehm and Thomas Radke 



single instruction single data (SISD) computers. Multiple instruction multiple 
data (MIMD) machines cover the most popular models of parallel computers. 

There are two major classes of parallel computers, namely shared-memory 
multiprocessors and message-passing multicomputers (Fig. 2). 



Global Memory 




_ 


1 



Memory Connect 



© ® 




Bus 

Crossbar 

Multistage Network 



Shared Memory Multiprocessor 




Fig. 2. Parallel computer classes 



The processors in a multiprocessor system communicate with each other 
through shared variables in a common memory, whereas each computer node 
in a multicomputer system has a local memory, not shared with other nodes. 
Interprocessor communication is done here through message passing. 

4 Multiprocessor Systems 

A multiprocessor system (MPS) is typically a RISC-based shared-memory mul- 
tiprocessor machine designed to provide a moderate amount of parallelism (up 
to 30 processors) to achieve more power than high-end workstations offer (for 
RISC processors see Table 2). 



Table 2. Performance of some RISC CPUs 



CPU Type 


Clock Rate Perf. 


CPU Type 


Clock Rate Perf. 




[MHz] 


[MIPS] 




[MHz] 


[MIPS] 


Alpha 21164 300 


1200 


MPC601 


80 


240 


MPC604 


100 


400 


MPC620 


133 


532 


SuperSparc 


60 


180 


UltraSparc 


167 


668 


PA7200 


140 


280 


R4400SC 


150 


150 


RIOOOO 


200 


800 


MC68060 


50 


100 


Pentium 100 100 


200 


Pentium Pro 133 


399 
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Table 3. Multiprocessor systems 



Company 


Model 


Scalability 1/ 0 Bus 
[processors] Type Bandwidth 


Sequent 


Symmetry 5000 


1. . .30 


symm. 240 MByte/s 


Silicon Graphics PowerChallenge 


1. . .30 


symm. 1.2 GByte/s 


Sun 


SPARCstat. 20 HS14 1. . .4 


symm. - 


Compaq 


ProLiant 4000 


1...4 


symm. 267 MByte/s 



Most computer manufacturers have multiprocessor (MP) extensions to their 
uniprocessor product line (Table 3). All additional processors are attached to the 
same global bus. Dedicated bus lines are reserved for coordinating the arbitration 
process between several requestors. The scalability of such systems is restricted 
to some dozens of processors due to the limited bandwidth of the common bus, 
which must be shared by all processors. The processors have equal access times 
to all memory nodes, which is why it is called a uniform memory-access (UMA) 
multiprocessor model. 

On the contrary, in nonuniform memory-access (NUMA) models the access 
time varies with the location of the memory word. This is because the memory 
is actually distributed but there are hardware means that the collection of all 
local memories forms a global address space accessible by all processors. A pro- 
cessor’s local memory can be accessed faster than a remote one. Such a logically 
shared memory based on physically distributed memory is called a virtual shared 
memory (VSM), especially if there is essential hardware support to realize this 
(Fig. 3). One special version of a VSM architecture is a cache-only memory ar- 
chitecture (COMA) such as in the KSR-1 machine (Fig. 4). Caches copy data 
from other caches if necessary. There is a continuous process of data migration. 
A cache attracts the needed data, and in the ideal case the user is completely 
freed from predefining the data layout. The drawbacks of such wonderful archi- 



physkal view logical view 




Fig. 3. Virtual shared memory 







346 Wolfgang Rehm and Thomas Radke 




Fig. 4. KSR ALLCACHE architecture [2] 



tectures lie in the synchronization costs for maintaining the cache coherency as 
well as the global synchronization (via semaphores). For further modifications 
of COMA models see [8]. 

Another distinction can be made between asymmetric and symmetric multi- 
processor systems (Fig. 5). When all processors have equal access to all peripheral 





Fig. 5. I/O types of multiprocessor systems 
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devices the system is called a symmetric multiprocessor (SMP). All processors 
are equally capable of running the executive programs, such as the operating 
system kernel and I/O service routines. In asymmetric systems only a master 
processor can execute the OS and handle I/O. Thus I/O becomes a bottleneck. 
Today we often find such systems in the form of two-processor stations whereas 
symmetric solutions signify four-processor board-based workstations or servers 
(Table 3). 

To overcome the drawbacks of the limited speed of a unified common global 
bus, connection schemes with crossbar technology have recently been developed 
[3] . The advantage is that more than one connection can be actived (dark points 
in Fig. 6) at the same time. The achievable transfer rates can be about 600 
MByte/s per CPU. The global bus is still in use but only as a broadcast medium 
for the snooping-bus cache-coherence mechanism [8]. 




Fig. 6. Crossbar switch 



5 Massively Parallel Processor Systems 

Massively parallel processor systems (MPP) usually consist of from hundreds 
to several thousands of identical processors, each of which has its own memory 
(distributed memory). The processors communicate with each other by message 
passing. There is no common global memory, although there are some approaches 
supporting a virtual shared memory by combinations of hardware and software. 
In this sense the KSR virtual-shared memory computer can be classified as an 
MPP system. 
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- Message transmission net- 
work for linking PEs 

- A crossbar switch consists 
of 3 crossbars, one for each 
axis, to create 2D and 3D 
structures. 

- At each level, a crossbar 
switch is capable of switch- 
ing up to 8 X 8 connections. 

- Data transfer rate: 300 
MByte/s in each direction 
of the bidirectional ports 



Fig. 7. Three-dimensional crossbar network in the Hitachi SR2201 [1] 



Distributed-memory multicomputers are most useful for problems that can 
be broken down into many relatively independent parts, each of which requires 
extensive computation. The interactions should be small because the overhead 
of interprocessor communication can degrade the system performance. The main 
limiting factors are the bandwidth and latency. Modern communication system 
techniques use special latency-reduction protocols such as wormhole routing. 
Moreover, different latency-hiding methods in software may be applicable. 

A fully connected network (clique) is applicable only for small numbers of 
nodes. To provide high-speed connections among individual processing nodes 
most parallel machines employ 2D or 3D crossbar switches, e.g., the Cray T3D 
and Hitachi SR2201 (Fig. 7). Table 4 shows the characteristics for prominent 
networks. 

6 Multiple Shared-Memory Multiprocessors 

One approach - a technology-driven one - for building a massively parallel sys- 
tem involves multiple shared-memory multiprocessors connected by a very high- 
bandwidth interconnect, such as HiPPI, in an optimized topology. 

One such interconnection of high-performance shared-memory multiproces- 
sors (of MPP/SMP type), the PowerChallenge array from SGI Corp., has been 
demonstrated to solve so-called “grand challenge” problems. 

A node in a message-passing interconnect is represented by a full SMP. A 
great advantage of such arrangements is that the computation-to-communication 
ratio (a measure for the proportion of the maximum computational power and 
communication peak performance) can be very high. That is, the amount of 
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Table 4. Properties of interconnection networks 



Type Degree Connections Diameter Bisectional Symm. 



Width 



Clique 


N-1 


N{N - l)/2 1 


{N/2f 


yes 


Linear chain 2 


N-1 


N-1 


1 


no 


Ring 


2 


N 


[N/2] 


2 


yes 


Binary tree 


3 


N-1 


2{{log,N)- 


1)1 


no 


2D grid 


4 


2N-2VN 


2{VN-1) 


2y/N 


no 


2D torus 


4 


2N 


2[^/N/2] 


2y/N 


yes 


Hypercube 


log^iV ATlog^AT 


logaAT 


N/2 


yes 



message passing is low compared with the amount of work to be done in each 
SMP node for each message sent. 

7 Multithreading Programming Model 

With the evolution of MPS originating from conventional uniprocessor machines, 
the programming of such systems was historically formed by features of UNIX. 
This classical operating system allowed the quasi-concurrent execution of several 
tasks (multitasking) and provided some mechanisms for inter-process communi- 
cation (e.g., pipes, sockets, shared-memory segments). These kernel services were 
quite expensive in their implementation (they are based on underlying standard 
network protocols) and caused high overheads. So they seemed to be unsuitable 
for efficient parallel programming. 

. For this reason the traditional task concept of UNIX was extended in a man- 
ner such that a process can have more than one single execution flow and may 
be divided into several threads of control that are independent of each other 
and thus can be executed in parallel. In this programming model a thread can 
be thought of as a light-weight process with much less state information than a 
normal UNIX task - it just owns a stack, a register set, and a program counter. 
All threads see the same address space. Communication between threads is per- 
formed through shared-memory variables. Access to these variables is managed 
by synchronization primitives (e.g., mutexes, semaphores, monitors, and so on). 

In general, the application programmer does not have to worry at all about 
the mapping of his threads onto the processor set of the MPS. This functional- 
ity can be fully implemented in the operating system kernel (pure kernel-level 
threads) or is part of a thread library with kernel support (mixed user/kernel 
level threads). 

The exclusive use of kernel-level threads is reasonable if the number of threads 
does not exceed the number of processors in the system. Each thread is fixed 
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Fig. 8. Multithreading with user- and kernel- level threads 



bound to its own processor and can run fully parallel to others. Synchronization 
can be implemented as busy waiting (the associated processor is not released 
but spins on a condition become true). 

If there are more threads than available processors (and this is the most fre- 
quent case) busy waiting between threads is no longer applicable because of pos- 
sible deadlocks. In this case synchronization can cause a thread switch on a pro- 
cessor. The switching of kernel-level threads can only be done in the kernel, i.e., 
special system calls are needed. The resulting overhead (kernel-thread context 
switch plus entering/leaving the kernel) may drastically decrease the efficiency 
of a pure kernel-level thread management. That is why a mixed thread man- 
agement is more favorable in this case: the programmer uses user-level threads, 
which are managed by a thread library; these user-level threads are internally 
mapped onto some kernel-level threads with their number corresponding to the 
number of available processors (see Fig. 8). Thread context switches can now 
completely be done in user mode, and time-expensive system calls are unneces- 
sary. 

It is true that the problem of optimal load balancing in MPS is not as difficult 
as in MPP. Because of the global shared memory every thread can principially 
be scheduled on any processor without explicite migration. But in NUMA archi- 
tectures, thread locality must be taken into consideration for achieving efficient 
multithreading. For instance, if there are still some thread data in a proces- 




Principles of Parallel Computers 351 



sor’s cache, this thread should be scheduled with precedence on that processor 
again. This technique, called memory-conscious scheduling, is used in particular 
in systems with multi-level memory hierarchies [4]. 

The efficiency of I/O intensive multithreaded applications strongly depends 
on the I/O architecture. In asymmetric systems every I/O operation forces a 
thread switch onto the master processor (which is capable of serving the re- 
quest, see Fig. 5). So the master may become a bottleneck. Only SMP systems 
guarantee a scalable I/O performance because each I/O request can be served 
on the processor where the thread resides. This circumstance is less decisive for 
multithreaded programs with a high ratio of computation to communication. 

At present, there exist a number of modern commercial and noncommercial 
standard operating systems that support multithreading and symmetric multi- 
processing: Solaris 2.x from SUN, Mach, Linux-SMP as public-domain software, 
Windows-NT from Microsoft. Research is aimed at the development of a unique 
multithreading programming interface (currently proposed as “POSIX 1003.4a 
Threads Extension Draft”). With this, the application programmer should be 
able to easily port his or her programs on any MPS architecture. 



8 Message-Passing Programming Model 

Message passing is the natural programming model for distributed-memory ar- 
chitectures. It is based on Hoare’s CSP concept (communicating sequential pro- 
cesses [7]), where an application consists of several sequential tasks that com- 
municate with each other by exchanging data over communication channels. 
These tasks are distributed among the nodes of an MPP and thus are executed 
in parallel. The communication channels are mapped onto the communication 
network. The communication hardware in modern MPP systems is capable of 
operating independently of its assigned compute node so that communication 
and computation can be done concurrently. 




Fig. 9. Mapping of the process graph onto the MPP 
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The efficiency of the parallel application is essentially determined by the 
quality of mapping the process graph with its communication edges onto the 
underlying distributed memory architecture (see Fig. 9). In the ideal case each 
task gets its own processor, and every communication channel corresponds with 
a direct physical link between both communication nodes. 

This can be realized in most cases from the view of available processors 
in massively parallel systems. However, scalability requires a relatively simple 
communication network (2D, 3D grid, ring, torus) so at this point compromises 
are unavoidable. For instance, a logical communication channel is routed when it 
passes one or more grid points. This transfer of data takes time, especially if there 
is no hardware support and the routing must be done by software emulation. 

On the one hand, communication paths with different delays arise by nonop- 
timal mapping of communication channels onto the network. On the other hand, 
several logical channels are multiplexed on one physical link. From the applica- 
tion programmer’s point of view, the usable communication bandwidth is de- 
creased. 

Since the beginning of the development of MPP, algorithms for various appli- 
cation classes with static problem sizes emerged that find the optimal mapping 
scheme for a given machine topology and thus allow the best exploitation of 
hardware performance. The identical transformation on other topologies is often 
combined with a loss of efficiency. That is why porting a parallel application 
requires at least some basic knowledge from the programmer about the target 
architecture. 

In recent years research activities were extended to the field of adaptive 
parallel algorithms development, i.e., of those application classes for which the 
process graph is adapted to the problem size dynamically. The decision of how 
to inbed the actual process graph into the processor graph cannot be made 
statically at the compile time but only at the runtime. Newly created tasks 
should be placed on processors with less workload to ensure a load balance. In 
addition, the communication paths to other tasks should be kept as short as 
possible and not be overloaded by existing channels. 

Those highly complex decisions can no longer be made by the application 
programmer alone. At this point the operating and runtime system of the MPP 
has to provide suitable process management functionalities (e.g., for obtaining 
status information on the current system workload, task placement, and migra- 
tion facilities) to support the programmers in their difficult job. 



9 Summary 

Parallel machines can be classified as multiprocessors and massively parallel sys- 
tems. These classes differ in the scale of parallelism and the memory architecture. 
Although multiprocessors have equal access to a global shared memory and thus 
are limited in processor number, they are scalable to hundreds up to thousands 
of processors with each node having its own local memory. 
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The hardware architecture determines the way in which the parallel computer 
is programmed. For multiprocessors the multithreading programming model is 
prefered. Parallelity in application programs is expressed here as cooperation 
of several threads of control that share some global data and synchronize with 
each other. Message passing is used in distributed-memory systems. Processes are 
executed in parallel on different processor nodes and communicate over channels. 

The ratio of communication to computation in a parallel program is deci- 
sive for its efficiency. Massively parallel computers provide a high computational 
power but typically have a lower communication bandwidth so that I/O intensive 
applications probably achieve poor performance. For this class of applications 
multiprocessors with their extremely low communication costs would be better 
suited. Application programmers have to keep these facts in mind when imple- 
menting their algorithms on a target architecture. 
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Abstract. There are several ways to write a parallel program. The parallel program- 
ming style depends on the architecture of the parallel machine, including the program- 
ming environment, the operating system, and the computer hardware. In this chapter 
we briefly introduce the three frequently used programming models for multiprocessor 
systems, shown on three examples of programming environments. The program exam- 
ples should develop a feehng for the kinds of parallel programming. 



1 Introduction 

The present-day status of the field of parallel programming is suiteably charac- 
terized by A. Reuter, leader of the Institute for Parallel and Distributed High 
Performance Computing at the University of Stuttgart (Germany): 

“Parallel Programming is difficult - actually it is not difficulty but when it 
should he effective it is really difficult ” 

Parallel computers open a new dimension and a new world of applications. These 
new facilities are expensive, both in scientific-technical and commercial terms, 
because the hardware costs are high and the new algorithms and software tools 
are still in development. 

The education in parallel computing is only now beginning. Today we offer 
at the Technical University of Chemnitz a specialized education profile “Parallel 
and Distributed Systems” with practical work on different parallel computers 
such as Parsytec Multicluster and KSR-1. 

2 Programming Models 

2.1 Definition 

Before translating a problem into a parallel program, the programming model 
of the computer to be used must be known. This model is dependent on the 
architecture of the parallel machine. It influences the choice of the algorithm 
to be implemented. For an effective solution a well-known algorithm may be 
parallelized. In many cases the development of a new algorithm is needed. The 
programming model is characterized by Burkhardt [1]: 

A programming model or programming style is the set of the features which 
describe a task for the solution on a computer. The features are: 
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• active elements (commands, processes, threads, objects, ...), 

• types of communication and synchronization (message passing, global vari- 
ables, tuple space, logical shared variables, ...), buffers, object inheritance, 

• guarded commands (entry conditions for certain program parts, depend on 
various instruction streams), 

• matching of active elements to processors (mapping, load balance), 

• behavior by errors (exceptions, redundance). 

2.2 Classification 

The classification of parallel programming models may follow many points of 
view. Various classification patterns overlap. This computer-model oriented clas- 
sification is derived from the well-known classification by Flynn: 

• SIMD (Single Instruction stream. Multiple Data stream) for pipeline- and 
array-computers, 

• MIMD-SM (Multiple instruction stream. Multiple data stream with Shared 
Memory), 

• MIMD-DM (Multiple Instruction stream, Multiple Data stream with Dis- 
tributed Memory). 

SIMD model means, that the same operation works on various data. In SIMD- 
Computers specialized compilers (e.g., for FORTRAN) are normally available. 
The user may specify array operations, which will be executed in parallel. The 
members of this class are vector-supercomputers, such as VPP by Fujitsu and 
various Cray computers. 

The MIMD model with shared memory has an unique address space over all 
processor nodes. Each active element may take advantage of all data. No ex- 
plicit programming of communication between the active elements is needed. 
The synchronization of parallel processes or threads is the programmers only 
task. However, the shared address space complicates the maintenance of data 
consistency. This problem will be solved with compiler systems or special hard- 
ware components. 

The shared memory is either a global shared memory or a distributed shared 
memory. Disributed shared memory means a realization through hardware as 
Virtual Shared Memory (VSM) or through software as Logical Shared Memory 
(LSM). Important examples are the VSM computers of KSR (Kendall Square 
Research) and the T3D (Cray). 

Although the programmer works with an unique address space, a knowledge 
of the memory structure is necessary to produce efficient code. 

MIMD models with distributed memory use only local data. The Program- 
ming of such systems is message passing, based on the CSP model (communi- 
cating sequential processes) by Hoare [4]. This model determines the kind of 
communication and syncronization, but not a special hardware base. Support of 
applications on heterogenous hardware bases for example could be realized with 
an overlayed software system such as PVM (parallel virtual machine). 
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As extension to the CSP model, the SPMD model (single program, multiple 
data stream), works in a similar way to the SIMD model. An identical program 
code is loaded on all processor nodes and works with various data. In opposite to 
SIMD in the SPMD model several program paths get through. A typical example 
is the GC-Power-Plus (Parsytec). 

A special kind of parallel computer is the homogenous or heterogenous work- 
station cluster. The programming of such systems is similar to programming in 
the MIMD-DM model. A commonly accepted approach is the Message Passing 
Interface MPI [7], based on well-known communication libraries such as PVM 
[3] or PARMACS [2]. 

3 Programming a Shared Memory Computer 

3.1 The KSR Programming Model 

The KSR is controlled by its own operating system, the KSR-OS, based on the 
MACH-Kernel and similar to UNIX system V. 

The KSR shared memory is based on a physically distributed memory, com- 
posed of a collection of local caches. A special hardware engine, the ALL- 
CACHE'^^, interconnects the local caches, provides routing and directory ser- 
vices, supports unified address space, and guarantees the maintenance of data 
consistency. 

The carriers of parallelism are pthreads (POSIX threads), started by an ac- 
tive instance (main program or other pthread). The operating system spreads 
pthreads on processors automatically. The distribution of pthreads by the user 
is allowed too. Many pthreads on one processor run in time-sharing mode. 

Data exchange among processor nodes is realized with shared variables. Each 
active element knows the same shared variable under the same name. When an 
active element accesses to a shared data which is not in the local memory, the 
ALLCACHE searches other local memories with a hierarchical strategy. That is 
why the users task is only to synchronize some actions. However, working with 
local data is a necessary condition for producing efficient code because each data 
exchange decreases the system performance. 

A key issue in writing parallal programs in the MIMD-SM style is controlled 
data sharing in parallel domains. Outside a parallel domain, only the program 
master pthread executes, and it always uses it’s own, private copy of data. Within 
a parallel domain a team of pthreads that execute the domain use either private 
or shared copies of data: 

1. Private data: Each pthread accesses its own copy of data. A variable ref- 
erence to private data is interpreted as a reference to a unique memory location, 
different for each team member. 

2. Shared data: Pthreads share access to one copy of the data entity. All 
variable references are interpreted as references to the same memory location. 
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All shared data is defined at or before link time and allocated by the operating 
system once, at process start-up. 

3.2 Levels of Parallelism 

The KSR allows parallelism on various levels: 

• Several users may log in simultanously. 

• Each user may open several sessions. 

• Each session allows many applications. 

• An application can contain many processes (with own address space). 

• A process may start various pthreads with unique address spaces. 



3.3 Program Implementation 

The KSR provides the languages Fortran and C with extensions to describe 
parallel instances. Parallelizing is supported on various levels (Table 1). 



Table 1. Parallelizing tools on the KSR 





Fortran?? 


C 


fully automatic parallelizing 


KAP 


- 


semi automatic parallelizing 


PRESTO 


SPC 


manual parallelizing 


library functions 


library functions 



Fully automatic parallelizing makes it possible for the user to gain the advan- 
tages of parallel execution of a program, without needing to know how to par- 
allelize the program. KAP (KSR automatic parallelizer) is a precompiler, which 
identifies parallel parts in loops and produces parallel source code by tiling the 
iteration space. The KAP inserts appropriate tihng directives, complete with all 
tiling parameters. 

PRESTO is a programming methodology and run-time system for highlevel 
parallel programming on the KSR. Most PRESTO constructs apply to Fortran 
programs. C users are supported by SPC, a simple C interface to PRESTO. 

PRESTO provides four major constructs: 



• Parallel regions - execute multiple instances of a code segment in parallel. 

• Parallel sections - execute various code segments in parallel. 

• Tile families - execute loops in parallel. 

• Affinity regions - coordinate tiling decisions made for a group of tile families 
referencing the same data. 
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For manual parallelizing the explicit use of pthread functions and synchro- 
nization utilities is needed. The pthread libraries are part of the operating system 
and provide statements for creating, terminating, and managing pthreads and 
teams of pthreads. 



3.4 Examples 

The first example shows the multiplication of two matrices. Because the cal- 
culation of target matrix elements is data independent, no synchronization is 
needed. The source program will be translated by the KAP. KAP includes con- 
trol statetements as Fortran comments (C*ksr) into the source code. As strategy 
the tiling is used. The iteration space is shared and parts of the work are assigned 
to pthreads. PRESTO finds an efficient partition of work, dependent on num- 
ber and load of processors. Because the shared memory is based on a physically 
distributed memory, the tile size is specified as multiples of 128-byte subpages. 
Figure 1 shows a tiling of 64 rows and 16 columns in dimensions i and j. Each 
task calculates 64 x 16 = 1024 iterations. The variables i and j are shared. Be- 
cause k is not tiled, KAP declares it as private. Therefore each pthread works 
with a private copy of k (Fig. 1). 



data space 




iteration space 

j 




Program matmul 

C shared variables 

real a(:,:), b(:,:), c(:,:), tl, t2 
integer n, np 
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print*, ’Enter linear dimension of matrices:' 
read(*,*) n 

print *, ’Enter number of pthreads:’ 

Read(*,*) np 

C allocate is a KSR specific statement 
allocate( a(n,n), b(n,n), c(n,n) ) 

C create a team of pthreads with an user specified number of 
C team members np, id is specified by PRESTO 
call ipr_create_team (np, id) 

C simple initalization of matrices 
do 10 i=l,n 
do 10 j=l,n 

a(i,j)=1.0 

b(ij)=1.0 

10 continue 

tl = all-seconds() 

C the following directive is included by KAP 
C*ksr* tile(i,j,private=k,teamid=id) 

C k is private for the pthread 
do 20 i=l,n 
do 20 j=l,n 

c(i,j)=0.0 
do 20 k=l,n 

c(ij) = c(i,j) + a(i,k) * b(k,j) 

20 continue 
C*ksr* endtile 

t2= all-seconds() 

print*, ’Calculation Time ',t2-tl 

end 



Result: 

Enter linear dimension of matrices: 256 
Enter number of pthreads: 1 
Calculation Time 8.8698763999855146 
Enter linear dimension of matrices: 256 
Enter number of pthreads: 8 
Calculation Time 1.1960556000121869 



The evaluation compares the run on one processor (a sequential program) 
with a run on eight processors. The result is a nearly linear speedup. That 
means the parallelization is effective (only a small overhead is incurred by creat- 
ing pthreads). Furthermore, the problem is easy to parallelize because no com- 
munication or synchronization is needed. 

The second example shows the use of PRESTO constructs. The program 
computes two vectors from a two-dimensional matrix (Fig. 2). The sparse ma- 
trix is represented by the array sm. The calculation of the sum of elements in 
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rows (dvr) and columns (dvc) is computed by several code sequences (paral- 
lel sections) . The rows and columns are computed by multiple instances of the 
same code (parallel regions). The program solves the problem with parallel re- 
gions (each starts a team of four pthreads) nested in two parallel sections. The 
variables i and j are private because both sections encounter i and j. To assign 
pthreads to indexes of rows or columns a logical pthread-number (mynum) is 
used. 





Fig. 2. Compute sums of rows and columns 



Program ParSecReg 
c shared data 

common sm(10,10), dvc(lO), dvr(lO) 
integer procid, numproc, mynum, tid 
do 10 i = 1,10 
dvr(i)=0 
dvc(i)=0 
10 continue 

do 20 i = 1,10 
do 20 j = 1,10 

sm(i,j) = 10 * (i-1) + j 
20 continue 

call ipsmJnit(numproc) 
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print* /number of available processors: ’,numproc 
c*ksr* parallel sections (private=(i,j,procid)) 
c*ksr* section 

c*ksr* parallel region (numthreads = 4, private= (i j, procid, mynum)) 
procid = ipsm_mypset() 
tid = ipr_tid() 
mynum = ipr_mid() 

print*/dvr is running on processor’, procid/tid=',tid/team_member\mynum 
do 40 i=l,10 
do 40 j=l,10 

if (mod(i,ipr_tsize{)) .eq. mynum) then 
dvr(i) = dvr(i) + sm(i,j) 

end if 

40 continue 
c*ksr* end parallel region 
c*ksr* section 

c*ksr* parallel region (numthreads = 4, private= (i j, procid, mynum)) 
procid = ipsm_mypset() 
tid = ipr_tid() 
mynum = lpr_mld() 

print*, ’dvc is running on processor’, procid, ’tid=', tid, ’team_member’, mynum 
do 50 i=l,10 
do 50 j=l,10 

if (mod(i,ipr_tsize()) .eq. mynum) then 
dvcO) = dvc(j) + sm(i,j) 

end if 

50 continue 

c*ksr* end parallel region 
c*ksr* end parallel sections 
do 50 i=l,10 

print*,’!:', I, 'dvr:', dvr(i), ’dvc:’,dvc(i) 

50 continue 

end 



Result: 

number of available processors: 8 
dvr is running on processor 7 tid= 
dvr Is running on processor 2 tid= 
dvr is running on processor 1 tid= 
dvr is running on processor 3 tid= 
dvc is running on processor 0 tid= 
dvc is running on processor 4 tld= 
dvc Is running on processor 6 tid= 
dvc is running on processor 5 tid= 
i: 1 dvr: 55.000000000000000 dvc: 
i: 2 dvr: 155.00000000000000 dvc: 
i: 3 dvr: 255.00000000000000 dvc: 
i: 4 dvr: 355.00000000000000 dvc: 



2 team_member 0 
2 team_member 2 
2 team_member 1 

2 team_member 3 

3 team_member 0 
3 team_member 1 
3 team_member 3 
3 team_member 2 

460.00000000000000 

470.00000000000000 

480.00000000000000 

490.00000000000000 
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5 dvr: 455.00000000000000 dvc: 500.00000000000000 

6 dvr: 555.00000000000000 dvc: 510.00000000000000 

7 dvr: 655.00000000000000 dvc: 520.00000000000000 

8 dvr: 755.00000000000000 dvc: 530.00000000000000 

9 dvr: 855.00000000000000 dvc: 540.00000000000000 

10 dvr: 955.00000000000000 dvc: 550.00000000000000 



4 Programming a Distributed Memory Computer 
Using PARIX 

4.1 What is PARIX 

PARIX is a parallel operating environment consisting of software components 
for machine adminstration, application development, and execution by the user. 
Some components of PARIX reside on a front-end machine (host), while others 
run on the parallel target system itself (Fig.3). The name PARIX stands for 
parallel extensions to UNIX and already explains some basic concepts which 
are: 



- PARIX is based on UNIX, 

- the user writes the parallel application by using a high-level language (C, 
FORTRAN) complemented with parallel extensions. 



Parallel Extensions 






UNIX 

Operating 

System 






Parallel Extensions 


Fronlend 

, 1 


Parallel 

Machine 



Fig. 3. PARIX software layer model 



The parallel extensions are provided by message-peissing libraries. PARJX 
has been designed for distributed memory multiprocessor machines with a range 
from two up to (theoretically) thousands of processors (Parsytec GC, Xplorer, 
Multicluster architecture) [10]. 
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4.2 PARIX Hardware Environment 

The front-end computer (host), usually a SUN compatible workstation, supplies 
the user with a development environment and gives parallel applications access 
to external facilities such as file server, networks, and graphics. The parallel 
machine consists of computing units and communication networks only. There 
is no direct support for the facilities mentioned above. Access to this is provided 
by integration of remote procedure calls (RPC’s). All services not supported by 
the parallel machine itself are shifted to the host. 

In order to permit many users to access the parallel machine at the same 
time, each user usually gets a part of the whole machine. In PARIX terminolgy 
this part or tile of the parallel machine is called a partition. It is one of the jobs 
of the system administrator to define numerous partitions with different sizes 
and locations during installation. The processors of a partition are arranged as 
a grid. Naturally it is possible to define a partition that contains all processors. 



4.3 Communication and Process Model Under PARIX 

The PARIX programming model is in most part CSP like. The concept of com- 
municating sequential processes (CSP) is defined by the following basics: In a 
message-passing system, a process always remains within its own address space. 
Communication between two processes is realized as message transfer via a com- 
munication channel. The data transfer has to be explicitely initiated. This model 
requires two different mechanisms for memory access. 

1 . The entire local memory can be accessed using normal operations (individual 
or local variables) 

2. “Remote” memory access requires interprocessor communication. 

Under PARIX, a communication channel is a bidirectional, synchronizing, 
point-to-point communication line between two processes and is called a link. 
Before a link can be used it has to be built up explicitely by calling a connection 
management function on both related sides. During the link construction the 
processes specify the communication partner by a process (or) number and give 
the link a name. Because address space as well as name space are local, each 
process can name the link independently. Therefore the given names can differ 
on both sides of a connection. In a latter data transfer the processes do not 
specify the communication partner by the destination process but by the named 
link (Fig. 4). 

The server process is responsible for processing remote procedure calls 
(RPC), e.g. example a simple printf() in C. The RPC channel is the only one 
that is predefined by the PARIX software. 
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parallel program server process 




Fig. 4. Communication model of PARIX 



4.4 Programming Model 

Local Memory. Each processor has its own local memory which is not shared 
by other processors in the system. Access to remote memory requires data trans- 
fer that has to be explicitly initiated. 



Main Program Loaded onto all Nodes. Independent of the number of 
requested processors it is sufficient to write only one program. PARIX includes 
an initial booting and loading mechanism which distributes an identical main 
program to all nodes of an allocated partition. Therefore there is no need for 
explicit mapping instructions that describe how to assign processes to processors. 
As explained later the user may influence this process indirectly by writing his 
program in such a manner that the software communication network is optimally 
assigned to the hardware network. 



Identification of the Individual Position Within the Network. The 

PARIX run-time system initializes a set of global data (called root structure 
in PARIX terminology) kept on each node. The root structure provides several 
data about the node position within the 2-d grid, the size of whole partition, etc. 
The most important information stored in the root structure is a unique proces- 
sor number, called ProcID in the range 0 to (number of processors- 1). Based on 
the ProcID the user may assign different instructions to certain processors. 

In this way it is possible to branch in different program sections (multiple 
instruction multiple data) or even load new executables (multiple program mul- 
tiple data). 
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/* Determine the number of this processor */ 
int myprocid = GETROOT()->ProcRoot->MyProclD; 

if (myprocid == 0) 

do_this(); /* Processor zero executes this part '^j 
else 

do_that(); /* all other processors execute this part */ 



Virtual Links. A virtual link is a bidirectional virtual channel between two 
processes for communication. The hnks are called virtual because there is no need 
for a direct hardware connection between the communicating processors. This 
concept overcomes the restrictions given by the presence of a limited number of 
physical links per processor. There is no difference between processes which are 
located on the same processor or on two arbitrary processors. 

With the knowledge (obtained from the root structure) about the positions of 
processors in the physical 2D grid of the requested partition, the user can build 
up the optimal virtual links to achieve the highest communication performance 
(e.g., Fig. 5). But this is a question of optimization and should interest only 
the “power user”. Usually the user defines his communication network to be 
hardware independent. 




Fig. 5. Optimized mapping of a logical ring on a partition 



4.5 An Example, PARIX says “Hello World” 

/* further explanations to this program follow later */ 
#include <sys/root.h> 

#include <sys/link.h> 
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^include <stdio.h> 

#include < stdlib.h> 

^include <string.h> 

#define TAG 17 

int main (void) 

{ 

int nProcs; 
int mylD; 

int Slave, Buffer, error; 

/* LinkCB_t (Link Control Block) is a PARIX specific data */ 

/* structure to hold information about a virtual link */ 

LinkCB.t *Link2Master; 

LinkCB-t *Links2Slaves[4]; 

mylD = GET_ROOT()->ProcRoot->MyProclD; /* Who am I */ 
nProcs = GET_ROOT()->ProcRoot->nProcs; /* How many procs */ 

if (mylD > 3) return; /* all processors with mylD > 3 say good bye */ 

if (mylD == 0) { /* Master , processor 0 executes this part */ 
printf( “Hello World from Master\n“); 
if (nProcs > 4) 

nProcs = 4; /* adaption of nProcs, if there are released processors */ 
for (Slave=l; Slave < nProcs; ++Slave) { 

/* Build up the links to the slave processors */ 
Links2Slaves[Slave] = ConnectLink(Slave, TAG, <Sderror); 
if (Links2Slaves[Slave] == NULL) 

exit(l); /* You should print out an error message before you 
doing this */ 

} 

for (Slave=l; Slave < nProcs; ++Slave) { 

RecvLink(Links2Slaves[Slave], AiBuffer, sizeof(int)); 
printf( “Hello World from Slave %d\n“, Buffer); 

} 

} 

else { /* Slaves, processors 1,2,3 execute this part */ 

Link2Master = ConnectLink(0, TAG, terror); 
if (Link2Master == NULL) 

exit(l); /* You should print out an error message before 
you doing this */ 

SendLink(Link2Master, ^rnylD, sizeof(int)); 

} 

return; 

} 

First, all processors find out which number they have and how many proces- 
sors were requested. Suppose we are using only four processors. For that reason 
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all processors with mylD greater than 3 will be released from work (0,1, 2, 3 are 
still in the race). The further execution depends on myID. Processor zero, called 
the master, branches into the true section (the range between if (my ID == 0) and 
else). The remaining processors, called slaves, execute the instructions between 
else and return. 

In order to comunicate with all slaves the master has to establish three links, 
one each to processor 1,2, and 3. This is done by calling ConnectLink() which 
is executed in a loop. ConnectLink() uses as arguments the destination pro- 
cessor (1..3), an identifier (TAG), and an error variable. The identifier TAG is 
used to distinguish different links to the same processor. On the master side 
the links are placed in an array of link control blocks {LinkCB-t) and labeled 
with Link2Slave[l], Link2Slave[2] , Link2Slave[3]. The entry Link2Slave[0] re- 
mains unused. Each slave builds up one link to the master. They all name the 
connection Link2Master. If no errors occur the establishment of all links is com- 
plete at this point of time. Now the processes are ready to communicate. 

After that all slaves send their identifier {mylD) to the master by calling 
SendLink(). The arguments of SendLink() are the named communication channel 
Link2Master^ the address of a buffer containing the data to be sent, and the 
number of bytes of the message. By calling RecvLink() the master receives the 
incoming messages. The slaves are served in order of their processor number. 
The communication happens synchronously. Both partners have to be ready for 
transfer data. If one of the processes is busy the other waits until the first is 
ready. 

Finally the master prints a message that obtains the received processor num- 
ber. 

The output should be something like that : 

Hello World from Master 
Hello World from Slave 1 
Hello World from Slave 2 
Hello World from Slave 3 



5 Programming Heterogenous Workstation Clusters 
Using MPI 

5.1 Introduction 

Message passing is a widely-used paradigm for writing parallel applications. But 
meanwhile there exist nearly as many variants to this paradigm as different hard- 
ware platforms, because each vendor offers his own communication primitives. As 
a result it is very difficult to port an application written for a particular system 
to another manufacturer’s hardware. In order to solve this problem numerous 
attempts have been made to propose a standard (Express, PVM, NX/2). For 
different reasons none of these interfaces has achieved the breakthrough of being 
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acknowledged as a widely accepted standard. But even the availability of a gen- 
eral standard is an important key to growing acceptance of parallel computers 
in industry and research. 

The required standardization process started with a workshop in 1992. Most 
of the major vendors of concurrent computers and researchers from universities, 
government laboratories, and industry were involved in the effort of developing a 
message passing interface standard, called MPI. In November 1993 the working 
group presented the draft MPI standard at the Supercomputing 93 conference. 
The main goal stated by the MPI forum is [7]: 

Ho develop a widely used standard for writing message passing programs. As 
such the interface should establish a practical, portable, efficient, and flexible 
standard for message passing”. 

Other goals are: 

- to allow efficient communication (memory-to-memory copying, overlap of 
computation und communication), 

- to allow for implementations that can be used in heterogenous environments, 

- to design an interface that is not too different from current practice, such as 
PVM, Express. 

Based on the message passing paradigm the MPI standard is suitable for de- 
veloping programs for distributed memory machines, shared memory machines, 
networks of workstations, and combinations of these. Because the MPI forum 
only defines the interfaces and the contents of message passing routines, every- 
one may develop his own implementation. All further explanations are related 
to the specific MPI implementation of Argonne National Laboratory/Mississippi 
State University MPICH. 

5.2 Basic Structure of MPICH 

Each MPI application can be seen as a collection of concurrent processes. In order 
to use MPI functions the application code is linked with a static library provided 
by the MPI software package. The library consists of two layers. The upper 
layer comprises all MPI functions that have been written hardware independent. 
The lower layer is the native communication subsystem on parallel machines or 
another message passing system, such as p4 or PVM. p4 offers less functionality 
than MPI but supports a wide varity of parallel computer systems. The MPI 
layer accesses the p4 layer through an abstract device interface. In this way all 
hardware dependencies will be kept out of the MPI layer and the user code. 

Processes with identical codes running on the same machine are called clus- 
ters in p4 terminology. p4 clusters are not visible to an MPI application. In 
order to achive peak performance, p4 uses shared memory (if available) for all 
processes in the same cluster. Special message passing interfaces are used for 
processes connected by such an interface. All processes have access to the socket 
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interface which is a de facto standard for all UNIX machines. Because the design 
for data transmission in global networks lacks the performance required for high- 
performance computers, those provide an additional high-speed message passing 
interface (see Fig. 6). Using this structure MPI covers a wide range of platforms, 
from real parallel machines up to networks of workstations [6] (Fig. 6). 



MPI- MPI- MPI- MPI- MPI- 

Process 0 Process 1 Process 2 Process 3 Process 4 




Fig. 6. Basic structure of the MPI implementation MPICH 



5.3 What Is Included in MPI? 

- Point-to-point communication 

- Collective operations 

- Process groups 

- Communication contexts 

- Process topologies 

- Bindings for Fortran?? and C 

- Environmental management and inquiry 

- Profiling interface 
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5.4 What Does the Standard Exclude? 

- Explicit shared-memory operations 

- Support for task management 

- Parallel I/O functions 



5.5 MPI Says “Hello World” 

MPI is a complex system that comprises 129 functions. But a small subset of six 
functions is sufficient to solve a moderate range of problems. The program be- 
low uses this subset, in which only basic point-to-point communication is shown. 
We decided to run this short application on a network of workstations, because 
this platform is the most available and easiest accessable. The program uses the 
SPMD paradigm. All MPI processes run identical codes (for heterogenous envi- 
ronments only the source code is common) . 

/* further explanations to this program follow later */ 

#include <mpi.h> ... 

#define MASTER 0 
#define TAG 1 

int main(int argc, char *argv[]) 

{ 

MPLStatus status; 

char Hostname[81]; /* This string contains later the hostnames */ 

char Buffer[81] = “Me“; /* a communication buffer with preinitialization ” Me” */ 

int myRank, nTasks, Slave; 

MPLInit(<Sdargc, <Sdargv); 

MPLComm_size(MPLCOMM_WORLD, ^nTasks); 
MPLComm_rank(MPLCOMM_WORLD, <SimyRank); 

gethostname(Hostname, 80); 

if (myRank == MASTER) 

for (Slave = 1; Slave < nTasks; SlaveT-l-) 

MPI_Send(Hostname, 80, MPLCHAR, Slave, TAG, 
MPLCOMM_WORLD); 
else 

MPLRecv(Buffer. 80, MPl.CHAR, MASTER,TAG. MPLCOMM.WORLD, 
«SdStatus); 

printf (“Hello World from Host %s\t rank %d : Master is %s\n“, 

Hostname, myRank, Buffer); 



MPLFinalizeO; 
return 0; } 
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The details of compiling this program depend on the systems you have. MPI does 
not include a standard for how to start the MPI processes. However all current 
MPI implementations do provide a programming environment for configuring 
and starting MPI applications. Under MPICH the best way to describe ones own 
parallel virtual machine is given by using a configuration file, called a process 
group file. On a heterogenous network, which requires different executables, it 
is the only possible way. The process group file contains the machines (first 
entry), the number of processes to start (second entry) and the full path of the 
executable programs. 

Example process group file hello.pg (Explanations below) 

sun_a 0 /home/jennifer/sun4/hello 

sun_b 1 /home/jennifer/sun4/hello 

ksrl 3 /home/jennifer/ksr/ksrhello 

Supposing we call the application fteZ/o, the process group file should be 
named hello.pg. To run the whole application it suffices to call hello on worksta- 
tion sun.a^ which serves as a console. A start-up procedure interprets the process 
group file and starts the specified processes. 



sun_a > hello 



The file above specifies five processes, one on both SUN workstations and 
three on a KSRl virtual shared memory multiprocessor machine. By calling 
hello on the console (in this case suri-d) one process is started by the user di- 
rectly. For this reason the first line of the process group file contains as number 
of (additional) processes the entry zero to start on every workstation just one 
process. 

The output should be : 



Hello World from Host sun_a 
Hello World from Host sun_b 
Hello World from Host ksrl 
Hello World from Host ksrl 
Hello World from Host ksrl 



rank 0 : Master is Me 
rank 1 : Master is sun^ 
rank 2 : Master is sun_a 
rank 3 : Master is sun_a 
rank 4 : Master is sun_a 



This program demonstrates the most common method for writing MIMD 
programs. Different processes, running on different processors, can execute dif- 
ferent program parts by branching within the program based on an identifier. In 
MPI this identifier is called rank. 



MPI Framework. The functions MPLInit() and MPLFinalize() build the 
framework around each MPI application. MPLInit() must be called before any 
other MPI function may be used. After a program has finished its MPI specific 
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part, the call of MPLFinalize() takes care for a tidy clean up. All pending MPI 
activities will be canceled. 

Who Am I, How Many Are We? As earlier mentioned MPI processes are 
represented by a rank. The function MPI-Commjrank() returns this unique iden- 
tifier, which simply is a nonnegative integer in range 0 ... (number of processes- 
1). To find out the total number of processes, MPI provides the function 
MPI-Comm-size(). Both MPLComm-rank() and MPLCommsizeO use the pa- 
rameter MPI-COMM.WORLDj which marks a determined process scope, called 
a communicator. 

The communicator concept is one of the most important of MPI and distin- 
guishes this standard from other existing message passing interfaces. Commu- 
nicators provide a local name space for processes and a mechanism for encap- 
sulating communication operations to build up various separate communication 
“universes” . That means a pending communication in one communicator never 
influences a data transfer in another communicator. The inital communicator 
MPLCOMM-WORLD contains all MPI processes started by the application. 
Based on MPLCOMM.WORLD other communicators may be derivated. 



communicator rank 




Fig. 7. Basic communicator structure 



In a transfered sense it would be possible to consider a communicator as a 
cover around a group of processes (Fig. 7). A communication operation always 
specifies a communicator. All processes involved in a communication operation 
have to be described by their representation on the top side of the cover (commu- 
nicator rank) . There are some other MPI concepts such as virtual topologies and 
user-defined attributes which may be coupled to a communicator. MPI does not 
support a dynamic process concept. After start up MPI provides no mechanism 
to spawn new processes and integrate them into a running application. 

Sending/Receiving Messages. A MPI message consists of a data part and 
a message envelope. The data part is specified by the first three parameters of 
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MPLSend()/MPI-Recv() which describe the location, size and datatype of the 
send buffer. The entire data transfer is based on MPI datatypes which correspond 
to the basic datatypes of the supported languages. In this example MPLCHAR 
is used which matches with Char in C. The message envelope describes desti- 
nation, tag, and communicator of the message. The tag argument can be used 
to distinguish different types of messages. By using tags the receiver can select 
particular messages. In this example the master, which is process zero, sends 
his host name to all other processes, called slaves. The slaves receive this string 
by using MPLRecv(). After communication is finished all processes print their 
“Hello World” that appear on the MPI console (Host sun_a). An easier way of 
obtaining the same result is given by using a broadcast operation. 



5.6 Current Available Implementations of MPI 

- MPICH implementation from Argonne National Laboratory /Mississippi 
State University 

- LAM implementation from the Ohio Supercomputing Center 

- CHIMP implementation from Edinburgh Parallel Computing Center 

- UNIFY implementation from Mississippi State University (subset of MPI) 

6 Summary 

This chapter presented some elementary facts about parallel programming of 
three widely used parallel architectures. The discussed concepts and models re- 
flect the main trends in the development and application of parallel machines. 
The reader experiences the ways which are used to explain parallelism on specific 
architectures and how to write the first parallel programs for shared memory and 
distributed memory machines and for heterogenous environments. Some charac- 
teristic examples help in giving a first view for parallel programming. 
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